HTML named characters

Eric Chatonet eric.chatonet at sosmartsoftware.com
Sun Jan 8 01:49:30 EST 2006


Hi David,

I think you can use two different ways:

1. By scripting:
Should be something like that.
If you want to take into account all special characters (about one  
hundred), the second way could appear better.

function StripTags pHtml -- returns the meaningful text from a web page
   local tRegex,tPrevText
   constant kHtml =  
"é,à,ç,>,<,ecirc;,è,©,•,&#39 
;,·,&"
   constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
   -----
   replace return with space in pHtml
   replace numtochar(13) with empty in pHtml
   replace tab with empty in pHtml
   -----
   put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
   put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
   put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
   -----
   replace " " with space in pHtml
   replace "<BR>" with return in pHtml
   replace "<p>" with return in pHtml
   -----
   put  "<[^><]*>" into tRegex
   put replacetext(pHtml,tRegex,"") into pHtml
   put replacetext(pHtml,tRegex,"") into pHtml
   -----
   repeat until tPrevText is pHtml
     put pHtml into tPrevText
     put replacetext(pHtml," +",space) into pHtml
     put replacetext(pHtml,"^ ","") into pHtml
   end repeat
   -----
   replace (space & return) with return in pHtml
   replace (return & space) with return in pHtml
   filter pHtml without empty
   -----
   replace """ with quote in pHtml
   repeat with i = 1 to the number of items of kHtml
     replace item i of kHtml with item i of kConvertedHtml in pHtml
   end repeat
   -----
   return pHtml
end StripTags

2. By placing the text into a field:
We discussed this way of doing some months ago and it appeared (I  
think that it was Richard who pointed that out) that the fastest way  
seemed to use a field in a substack without opening it (if I remember  
correctly :-)

on StripTags pHtml
   set the htmlText of fld "HtmlTemplate" of stack "HtmlConverter" to  
pHtml
   return the text of fld "HtmlTemplate" of stack "HtmlConver
end StripTags

Best Regards from Paris,
Eric Chatonet

Le 7 janv. 06 à 01:10, David Bovill a écrit :

> On 7 Jan 2006, at 23:30, Eric Chatonet wrote:
>
>> Hi David,
>>
>> From the docs:
>>
>>     Á    Á
>>     á    á
>>     Â    Â
>>     â    â
>>     ´    ´
>>     Æ    Æ
>>     æ    æ
>>     À    À
>>     à    à
>>     Å    Å
>>     å    å
>>     Ã    Ã
>>     ã    ã
>>     Ä    Ä
>>     ä    ä
>>
>> And many others.
>>
>
> This is from the htmlText property - yes? But that requires me to  
> set the htmlText of a field... which is not such fun for a  
> parser :) Guess I will have to manually stick them all in an array?

------------------------------------------------------------------------ 
----------------------
http://www.sosmartsoftware.com/    eric.chatonet at sosmartsoftware.com/





More information about the use-livecode mailing list