remove html tags from text
Jim Ault
JimAultWins at yahoo.com
Fri Sep 8 20:35:02 EDT 2006
Cubist is correct. Any well-formed page will have balanced tags and only
use the < and > chars to mean tag markers.
If the data is critical to you, then beware of exceptions. There could be
incomplete downloads and less-than-professional html code that is received
by Rev which could contain confusing "< >" instances. This is a good reason
to do the 'one-tag-per-one-line' approach and view the result to locate any
weirdness before trusting the data.
The correct way to tell html to show the viewer a character is
a < is >
a > is <
a & is &
and many more for various language characters and high ascii
You could test for an incomplete download by finding the last tag of
"</html>". If it is missing, you should retry.
Jim Ault
Las Vegas
On 9/8/06 5:09 PM, "Cubist at aol.com" <Cubist at aol.com> wrote:
> In a message dated 9/8/06 11:40:31 AM, Mark Wieder <mwieder at ahsoftware.net>
> writes:
>> Friday, September 8, 2006, 6:10:53 AM, you wrote:
>>> barely tested, but maybe a starting point:
>>> function striptags tHtml
>>> replace cr with empty in tHtml -- in case of multi-line tags
>>> replace "<" with cr & "<" in tHtml
>>> replace ">" with ">" & cr in tHtml
>>> filter tHtml without "*<*"
>>> filter tHtml without "*>*"
>>> return tHtml
>>> end striptags
>> Clever... but it'll fail on
>>
>> if xyz > 4096 then
> No, it won't; not if you're working with an honest-to-God HTML document,
> at least. Greater-than and less-than signs are *only* found *in the HTML
> source*; if you want either of those symbols to show up when someone views
> your page
> in a browser window, both of them will be HTML entities that start with an
> ampersand and end with a semicolon.
>
>> maybe replace the two filter lines with
>>
>> filter tHtml without "<*>"
> I don't think there's any need to go that route. Under what circumstances
> will you ever encounter a document which includes angle-bracketed HTML tags
> *and* leaves honest-to-God angle brackets in their natural, un-Entity-ized
> state?
More information about the use-livecode
mailing list