remove html tags from text

Richard Gaskin ambassador at
Sun Sep 10 11:26:31 EDT 2006

Ken Ray wrote:

>> So I have two questions about this sort of parsing as opposed to using a
>> field object to so the same:
>> 1. Which is more fault-tolerant?
> Good question - one problem with the field object that was identified by
> Sivakatirswami back in August with this was that if you have an html tag
> with <title> in it (like: <title>Chapter 1: Great Revolution
> Recipes</title>), when you set the htmlText of the field to the html that
> contains the <title>, everything that is in the <title> tag doesn't show up
> in the field, and can't be retrieved ever again.

As a HEAD element and not a BODY element, should <title> be considered 
data or metadata?  After all, <title> is only used by the browser to set 
the window name, and the contents of that tag are not rendered in the 
page.  In this regard Rev does the same:  it doesn't render <title> 
content in the field, but if you want to process the HEAD data before 
passing the BODY into the field for rendering you can handle <title> 
just like a browser does.

> Granted, I'm sure there are only a few situations like this, and are not
> likely to affect 99% of us, but I think the replaceText solution is at about
> the same level of efficiency.

How does the regex solution hold up to things like "<" and ">" within 
"<code>" tags as Jacque noted, or other legitimate incusions of those or 
other characters which are also used as control characters in SGML?

>> 2. Which is faster?
> The field approach, hands down. This is because any regex that needs to run
> needs to be handled by the PCRE library so there's more "hand off" time
> involved.

Yes, the generalization of regex makes it convenient but I've never seen 
any case where a faster solution couldn't be crafted from the offset 
function and the like.

So until someone can demonstrate otherwise, I'm sticking with using 
fields to strip tags from text.....

  Richard Gaskin
  Managing Editor, revJournal
  Rev tips, tutorials and more:

More information about the Use-livecode mailing list