remove html tags from text

Jim Ault JimAultWins at yahoo.com
Sun Sep 10 14:48:59 EDT 2006


By the way, which tags are used makes a big difference.

Table tags are used frequently for layout.  Notice the difference below

Depending on the tags used, html considers a run of spaces as one space, but
a parser has to accommodate the difference.  This means that the source can
have extra characters and still look the same to the viewer.

----- paste into your favorite word processor, save as something.html, then
---- open in a browser

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Untitled</title>
    <meta name="generator" content="BBEdit 8.2" />
</head>
<body>
<pre>
 put 2< 3
 put 2<      3
 put 2<           3
 </pre>
 <table>
    <tr>
        <td>
             put 2< 3
        </td>
            <td>
                  second 2 <      3
        </td>
        <td>
                 third    2  <           3
        </td>
    </tr>
</table>
</body>
</html>


Ahh, the wonderful world of browsers and html.

Jim Ault
Las Vegas

On 9/10/06 10:55 AM, "Richard Gaskin" <ambassador at fourthworld.com> wrote:

> Mark Smith wrote:
>> On 10 Sep 2006, at 16:26, Richard Gaskin wrote:
>> 
>>> So until someone can demonstrate otherwise, I'm sticking with using
>>> fields to strip tags from text.....
>> 
>> Though doesn't this approach fail with legitimate "<" characters in
>> <code>  (or other) tags?
>> 
>> Of course, that may not be important in your usage.
> 
> It's very important for most of us, since we're looking for the most
> robust solution.
> 
> I just tried it and found that this:
> 
> <pre>
> put 2<3
> </pre>
> 
> ...produces an incomplete rendering like this:
> 
>    put 2
> 
> ...but this:
> 
> <pre>
> put 2< 3
> </pre>
> 
> ...is rendered as expected like this:
> 
> put 2< 3
> 
> 
> I would imagine similar results if we special-case the regex solution to
> also handle non-white space after a "<".
> 
> If both methods are equally robust then the one to use would be the
> fastest.  But if one of more fault-tolerant than the other, than if the
> speed of both is at least acceptable than I'd go with the more robust one.
> 
> --
>   Richard Gaskin
>   Managing Editor, revJournal
>   _______________________________________________________
>   Rev tips, tutorials and more: http://www.revJournal.com
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution





More information about the use-livecode mailing list