Getting the text content of a HTML page

H Baric hbaric at gmail.com
Sat Aug 2 11:19:57 EDT 2008


LOL Eric sorry :D

Well, you know I could have thought of that!
So simple and obvious really isn't it!
I mean, I could have just asked my two year old instead!

:-o
:-|

Well, I was going to just take myself to bed when I saw all that code, but 
at least I could understand it, and so decided to just tried it out...

And it works except - all the CSS remains! (Anyone ever heard of linked 
stylesheets sheesh!)

So rather than add a million more lines to the script (would it ever be 
complete!), I'm thinking I shall give up for now, at least until tomorrow 
when I am well slept, and can think up nice little incomplicated things to 
create for the purpose of keeping the old brain cells alive.

Thanks for your help again Eric.

Heather, who is determined to be a programmer when she grows up.
At 36yrs though, she is wondering if she should just stick to knitting.
on knitOne ; select chunk of wool ; tie it in a knot ; create noose ; end 
knitOne

----- Original Message ----- 
From: "Eric Chatonet" <eric.chatonet at sosmartsoftware.com>
To: "How to use Revolution" <use-revolution at lists.runrev.com>
Sent: Sunday, August 03, 2008 12:33 AM
Subject: Re: Getting the text content of a HTML page


Re,

Le 2 août 08 à 16:31, H Baric a écrit :

> * Get the text only from a web page - no html tags, no formatting etc.

LOL
This is a case that needs some additional code snippet as I said in a
previous email :-)

put StripTags(thePage) into field "The Page"
---------------------------------------------------------
function StripTags pHtml -- returns the meaningful text from a web page
   local tRegex,tPrevText
   constant kHtml =
"é,à,ç,>,<,ecirc;,è,©,•,&#39
;,·,&"
   constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
   -----
   replace return with space in pHtml
   replace numtochar(13) with empty in pHtml
   replace tab with empty in pHtml
   -----
   put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
   put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
   put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
   -----
   replace " " with space in pHtml
   replace "<BR>" with return in pHtml
   replace "<p>" with return in pHtml
   -----
   put  "<[^><]*>" into tRegex
   put replacetext(pHtml,tRegex,"") into pHtml
   put replacetext(pHtml,tRegex,"") into pHtml
   -----
   repeat until tPrevText is pHtml
     put pHtml into tPrevText
     put replacetext(pHtml," +",space) into pHtml
     put replacetext(pHtml,"^ ","") into pHtml
   end repeat
   -----
   replace (space & return) with return in pHtml
   replace (return & space) with return in pHtml
   filter pHtml without empty
   -----
   replace """ with quote in pHtml
   repeat with i = 1 to the number of items of kHtml
     replace item i of kHtml with item i of kConvertedHtml in pHtml
   end repeat
   -----
   return pHtml
end StripTags

Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: eric.chatonet at sosmartsoftware.com/
----------------------------------------------------------------


_______________________________________________
use-revolution mailing list
use-revolution at lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution 




More information about the use-livecode mailing list