Getting the text content of a HTML page

H Baric hbaric at
Sat Aug 2 11:19:57 EDT 2008

LOL Eric sorry :D

Well, you know I could have thought of that!
So simple and obvious really isn't it!
I mean, I could have just asked my two year old instead!


Well, I was going to just take myself to bed when I saw all that code, but 
at least I could understand it, and so decided to just tried it out...

And it works except - all the CSS remains! (Anyone ever heard of linked 
stylesheets sheesh!)

So rather than add a million more lines to the script (would it ever be 
complete!), I'm thinking I shall give up for now, at least until tomorrow 
when I am well slept, and can think up nice little incomplicated things to 
create for the purpose of keeping the old brain cells alive.

Thanks for your help again Eric.

Heather, who is determined to be a programmer when she grows up.
At 36yrs though, she is wondering if she should just stick to knitting.
on knitOne ; select chunk of wool ; tie it in a knot ; create noose ; end 

----- Original Message ----- 
From: "Eric Chatonet" <eric.chatonet at>
To: "How to use Revolution" <use-revolution at>
Sent: Sunday, August 03, 2008 12:33 AM
Subject: Re: Getting the text content of a HTML page


Le 2 août 08 à 16:31, H Baric a écrit :

> * Get the text only from a web page - no html tags, no formatting etc.

This is a case that needs some additional code snippet as I said in a
previous email :-)

put StripTags(thePage) into field "The Page"
function StripTags pHtml -- returns the meaningful text from a web page
   local tRegex,tPrevText
   constant kHtml =
   constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
   replace return with space in pHtml
   replace numtochar(13) with empty in pHtml
   replace tab with empty in pHtml
   put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
   put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
   put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
   replace " " with space in pHtml
   replace "<BR>" with return in pHtml
   replace "<p>" with return in pHtml
   put  "<[^><]*>" into tRegex
   put replacetext(pHtml,tRegex,"") into pHtml
   put replacetext(pHtml,tRegex,"") into pHtml
   repeat until tPrevText is pHtml
     put pHtml into tPrevText
     put replacetext(pHtml," +",space) into pHtml
     put replacetext(pHtml,"^ ","") into pHtml
   end repeat
   replace (space & return) with return in pHtml
   replace (return & space) with return in pHtml
   filter pHtml without empty
   replace """ with quote in pHtml
   repeat with i = 1 to the number of items of kHtml
     replace item i of kHtml with item i of kConvertedHtml in pHtml
   end repeat
   return pHtml
end StripTags

Best regards from Paris,
Eric Chatonet.
Plugins and tutorials for Revolution:
Email: eric.chatonet at

use-revolution mailing list
use-revolution at
Please visit this url to subscribe, unsubscribe and manage your subscription 

More information about the use-livecode mailing list