remove html tags from text

Jim Ault JimAultWins at yahoo.com
Fri Sep 8 20:35:02 EDT 2006


Cubist  is correct.  Any well-formed page will have balanced tags and only
use the < and > chars to mean tag markers.

If the data is critical to you, then beware of exceptions.  There could be
incomplete downloads and less-than-professional html code that is received
by Rev which could contain confusing "< >" instances.  This is a good reason
to do the 'one-tag-per-one-line' approach and view the result to locate any
weirdness before trusting the data.

The correct way to tell html to show the viewer a character is
a    <  is > 
a    > is  <   
a    &  is  &
and many more for various language characters and high ascii

You could test for an incomplete download by finding the last tag of
"</html>".  If it is missing, you should retry.

Jim Ault
Las Vegas

On 9/8/06 5:09 PM, "Cubist at aol.com" <Cubist at aol.com> wrote:

> In a message dated 9/8/06 11:40:31 AM, Mark Wieder <mwieder at ahsoftware.net>
> writes:
>> Friday, September 8, 2006, 6:10:53 AM, you wrote:
>>> barely tested, but maybe a starting point:
>>> function striptags tHtml
>>>    replace cr with empty in tHtml -- in case of multi-line tags
>>>    replace "<" with cr & "<" in tHtml
>>>    replace ">" with ">" & cr in tHtml
>>>    filter tHtml without "*<*"
>>>    filter tHtml without "*>*"
>>>    return tHtml
>>> end striptags
>> Clever... but it'll fail on
>> 
>> if xyz > 4096 then
>    No, it won't; not if you're working with an honest-to-God HTML document,
> at least. Greater-than and less-than signs are *only* found *in the HTML
> source*; if you want either of those symbols to show up when someone views
> your page 
> in a browser window, both of them will be HTML entities that start with an
> ampersand and end with a semicolon.
> 
>> maybe replace the two filter lines with
>> 
>>   filter tHtml without "<*>"
>    I don't think there's any need to go that route. Under what circumstances
> will you ever encounter a document which includes angle-bracketed HTML tags
> *and* leaves honest-to-God angle brackets in their natural, un-Entity-ized
> state?





More information about the use-livecode mailing list