HTML named characters
Eric Chatonet
eric.chatonet at sosmartsoftware.com
Sun Jan 8 01:49:30 EST 2006
Hi David,
I think you can use two different ways:
1. By scripting:
Should be something like that.
If you want to take into account all special characters (about one
hundred), the second way could appear better.
function StripTags pHtml -- returns the meaningful text from a web page
local tRegex,tPrevText
constant kHtml =
"é,à,ç,>,<,ecirc;,è,©,,'
;,·,&"
constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
-----
replace return with space in pHtml
replace numtochar(13) with empty in pHtml
replace tab with empty in pHtml
-----
put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
-----
replace " " with space in pHtml
replace "<BR>" with return in pHtml
replace "<p>" with return in pHtml
-----
put "<[^><]*>" into tRegex
put replacetext(pHtml,tRegex,"") into pHtml
put replacetext(pHtml,tRegex,"") into pHtml
-----
repeat until tPrevText is pHtml
put pHtml into tPrevText
put replacetext(pHtml," +",space) into pHtml
put replacetext(pHtml,"^ ","") into pHtml
end repeat
-----
replace (space & return) with return in pHtml
replace (return & space) with return in pHtml
filter pHtml without empty
-----
replace """ with quote in pHtml
repeat with i = 1 to the number of items of kHtml
replace item i of kHtml with item i of kConvertedHtml in pHtml
end repeat
-----
return pHtml
end StripTags
2. By placing the text into a field:
We discussed this way of doing some months ago and it appeared (I
think that it was Richard who pointed that out) that the fastest way
seemed to use a field in a substack without opening it (if I remember
correctly :-)
on StripTags pHtml
set the htmlText of fld "HtmlTemplate" of stack "HtmlConverter" to
pHtml
return the text of fld "HtmlTemplate" of stack "HtmlConver
end StripTags
Best Regards from Paris,
Eric Chatonet
Le 7 janv. 06 à 01:10, David Bovill a écrit :
> On 7 Jan 2006, at 23:30, Eric Chatonet wrote:
>
>> Hi David,
>>
>> From the docs:
>>
>> Á Á
>> á á
>> Â Â
>> â â
>> ´ ´
>> Æ Æ
>> æ æ
>> À À
>> à à
>> Å Å
>> å å
>> Ã Ã
>> ã ã
>> Ä Ä
>> ä ä
>>
>> And many others.
>>
>
> This is from the htmlText property - yes? But that requires me to
> set the htmlText of a field... which is not such fun for a
> parser :) Guess I will have to manually stick them all in an array?
------------------------------------------------------------------------
----------------------
http://www.sosmartsoftware.com/ eric.chatonet at sosmartsoftware.com/
More information about the use-livecode
mailing list