remove html tags from text
ambassador at fourthworld.com
Sun Sep 10 11:26:31 EDT 2006
Ken Ray wrote:
>> So I have two questions about this sort of parsing as opposed to using a
>> field object to so the same:
>> 1. Which is more fault-tolerant?
> Good question - one problem with the field object that was identified by
> Sivakatirswami back in August with this was that if you have an html tag
> with <title> in it (like: <title>Chapter 1: Great Revolution
> Recipes</title>), when you set the htmlText of the field to the html that
> contains the <title>, everything that is in the <title> tag doesn't show up
> in the field, and can't be retrieved ever again.
As a HEAD element and not a BODY element, should <title> be considered
data or metadata? After all, <title> is only used by the browser to set
the window name, and the contents of that tag are not rendered in the
page. In this regard Rev does the same: it doesn't render <title>
content in the field, but if you want to process the HEAD data before
passing the BODY into the field for rendering you can handle <title>
just like a browser does.
> Granted, I'm sure there are only a few situations like this, and are not
> likely to affect 99% of us, but I think the replaceText solution is at about
> the same level of efficiency.
How does the regex solution hold up to things like "<" and ">" within
"<code>" tags as Jacque noted, or other legitimate incusions of those or
other characters which are also used as control characters in SGML?
>> 2. Which is faster?
> The field approach, hands down. This is because any regex that needs to run
> needs to be handled by the PCRE library so there's more "hand off" time
Yes, the generalization of regex makes it convenient but I've never seen
any case where a faster solution couldn't be crafted from the offset
function and the like.
So until someone can demonstrate otherwise, I'm sticking with using
fields to strip tags from text.....
Managing Editor, revJournal
Rev tips, tutorials and more: http://www.revJournal.com
More information about the Use-livecode