Getting the text content of a HTML page
viktoras didziulis
viktoras at ekoinf.net
Mon Aug 4 14:41:45 EDT 2008
whoops sorry, I tested this with basic tags like <b>jsajka</b>. The next
'thing' seem to work OK (the text is in fText field):
put replaceText(fld "fText","</?[A-Za-z1-9 ='" & quote & "]+>","") into
fld "fText"
A small explanation:
/? means zero or 1 occurence of / - because tags may be either opening
(without /) or closing (with /)
[A-Za-z1-9 ='" & quote & "]+ - one or more occurrences of any symbol
from A to Z and a to z and 1 to 9 including space, single and double
quote. Sorry, I used & quote & fo double quote because I could not
figure it out how to escape quotes in Revolution...
Viktoras
Richard Gaskin wrote:
> viktoras didziulis wrote:
>> one more way to do things using regular expressions:
>>
>> put the replaceText(myText,"</?[A-Za-z]+>","") into myText
>>
>> will simply replace all tags with empty string. Where myText is the
>> text where replacements have to be made. </?[A-Za-z]+> is a regular
>> expression matching most html tags and "" is empty replacement string.
>
> Always looking for potential optimizations, I was going to benchmark
> that here but couldn't get it to work, even after removing "the". :(
>
More information about the use-livecode
mailing list