Getting the text content of a HTML page
H Baric
hbaric at gmail.com
Sat Aug 2 11:19:57 EDT 2008
LOL Eric sorry :D
Well, you know I could have thought of that!
So simple and obvious really isn't it!
I mean, I could have just asked my two year old instead!
:-o
:-|
Well, I was going to just take myself to bed when I saw all that code, but
at least I could understand it, and so decided to just tried it out...
And it works except - all the CSS remains! (Anyone ever heard of linked
stylesheets sheesh!)
So rather than add a million more lines to the script (would it ever be
complete!), I'm thinking I shall give up for now, at least until tomorrow
when I am well slept, and can think up nice little incomplicated things to
create for the purpose of keeping the old brain cells alive.
Thanks for your help again Eric.
Heather, who is determined to be a programmer when she grows up.
At 36yrs though, she is wondering if she should just stick to knitting.
on knitOne ; select chunk of wool ; tie it in a knot ; create noose ; end
knitOne
----- Original Message -----
From: "Eric Chatonet" <eric.chatonet at sosmartsoftware.com>
To: "How to use Revolution" <use-revolution at lists.runrev.com>
Sent: Sunday, August 03, 2008 12:33 AM
Subject: Re: Getting the text content of a HTML page
Re,
Le 2 août 08 à 16:31, H Baric a écrit :
> * Get the text only from a web page - no html tags, no formatting etc.
LOL
This is a case that needs some additional code snippet as I said in a
previous email :-)
put StripTags(thePage) into field "The Page"
---------------------------------------------------------
function StripTags pHtml -- returns the meaningful text from a web page
local tRegex,tPrevText
constant kHtml =
"é,à,ç,>,<,ecirc;,è,©,,'
;,·,&"
constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
-----
replace return with space in pHtml
replace numtochar(13) with empty in pHtml
replace tab with empty in pHtml
-----
put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
-----
replace " " with space in pHtml
replace "<BR>" with return in pHtml
replace "<p>" with return in pHtml
-----
put "<[^><]*>" into tRegex
put replacetext(pHtml,tRegex,"") into pHtml
put replacetext(pHtml,tRegex,"") into pHtml
-----
repeat until tPrevText is pHtml
put pHtml into tPrevText
put replacetext(pHtml," +",space) into pHtml
put replacetext(pHtml,"^ ","") into pHtml
end repeat
-----
replace (space & return) with return in pHtml
replace (return & space) with return in pHtml
filter pHtml without empty
-----
replace """ with quote in pHtml
repeat with i = 1 to the number of items of kHtml
replace item i of kHtml with item i of kConvertedHtml in pHtml
end repeat
-----
return pHtml
end StripTags
Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: eric.chatonet at sosmartsoftware.com/
----------------------------------------------------------------
_______________________________________________
use-revolution mailing list
use-revolution at lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list