remove html tags from text

Richard Gaskin ambassador at fourthworld.com
Sun Sep 10 11:26:31 EDT 2006


Ken Ray wrote:

>> So I have two questions about this sort of parsing as opposed to using a
>> field object to so the same:
>> 
>> 1. Which is more fault-tolerant?
> 
> Good question - one problem with the field object that was identified by
> Sivakatirswami back in August with this was that if you have an html tag
> with <title> in it (like: <title>Chapter 1: Great Revolution
> Recipes</title>), when you set the htmlText of the field to the html that
> contains the <title>, everything that is in the <title> tag doesn't show up
> in the field, and can't be retrieved ever again.

As a HEAD element and not a BODY element, should <title> be considered 
data or metadata?  After all, <title> is only used by the browser to set 
the window name, and the contents of that tag are not rendered in the 
page.  In this regard Rev does the same:  it doesn't render <title> 
content in the field, but if you want to process the HEAD data before 
passing the BODY into the field for rendering you can handle <title> 
just like a browser does.


> Granted, I'm sure there are only a few situations like this, and are not
> likely to affect 99% of us, but I think the replaceText solution is at about
> the same level of efficiency.

How does the regex solution hold up to things like "<" and ">" within 
"<code>" tags as Jacque noted, or other legitimate incusions of those or 
other characters which are also used as control characters in SGML?


>> 2. Which is faster?
> 
> The field approach, hands down. This is because any regex that needs to run
> needs to be handled by the PCRE library so there's more "hand off" time
> involved.

Yes, the generalization of regex makes it convenient but I've never seen 
any case where a faster solution couldn't be crafted from the offset 
function and the like.

So until someone can demonstrate otherwise, I'm sticking with using 
fields to strip tags from text.....

--
  Richard Gaskin
  Managing Editor, revJournal
  _______________________________________________________
  Rev tips, tutorials and more: http://www.revJournal.com



More information about the use-livecode mailing list