Getting the text content of a HTML page

viktoras didziulis viktoras at ekoinf.net
Mon Aug 4 14:41:45 EDT 2008


whoops sorry, I tested this with basic tags like <b>jsajka</b>. The next 
'thing' seem to work OK (the text is in fText field):

put replaceText(fld "fText","</?[A-Za-z1-9 ='" & quote & "]+>","") into 
fld "fText"

A small explanation:
/? means zero or 1 occurence of / - because tags may be either opening 
(without /) or closing (with /)
[A-Za-z1-9 ='" & quote & "]+ - one or more occurrences of any symbol 
from A to Z and a to z and 1 to 9 including space, single and double 
quote. Sorry, I used & quote & fo double quote because I could not 
figure it out how to escape quotes in Revolution...

Viktoras




Richard Gaskin wrote:
> viktoras didziulis wrote:
>> one more way to do things using regular expressions:
>>
>> put the replaceText(myText,"</?[A-Za-z]+>","") into myText
>>
>> will simply replace all tags with empty string. Where myText is the 
>> text where replacements have to be made. </?[A-Za-z]+> is a regular 
>> expression matching most html tags and "" is empty replacement string.
>
> Always looking for potential optimizations, I was going to benchmark 
> that here but couldn't get it to work, even after removing "the". :(
>




More information about the use-livecode mailing list