HTML Tables
Jim Ault
JimAultWins at yahoo.com
Wed Apr 19 10:08:19 EDT 2006
Here is one that used some RegEx that I sent to Eric Chatonet and he ramped
up to a real code function for removing tags. Note that a couple steps are
commented out since the exact page you are trying to parse may or may not
need different treatment for spaces and returns.
Try this way, then tweak the "--replace" lines to see if that gives a better
result.
-------------------- start copy here
function StripTags pHtml
local tRegex,tPrevText
constant kHtml =
"é,à,ç,>,<,ecirc;,è,©,,',&m
iddot;,&"
constant kConvertedHtml = "é,à,ç,>,<,ê,è,©"
put kConvertedHtml into tempp
put "," & numtochar(165) & "," & numtochar(39) & "," & numtochar(225) &
"," & numtochar(38) after tempp
-----
--replace return with space in pHtml
--replace return with "MMMM" in pHtml
replace numtochar(13) with empty in pHtml
replace tab with empty in pHtml
-----
put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
-----
replace " " with space in pHtml
replace "<BR>" with return in pHtml
replace "<p>" with return in pHtml
-----
put "<[^><]*>" into tRegex
put replacetext(pHtml,tRegex,"") into pHtml
put replacetext(pHtml,tRegex,"") into pHtml
-----
repeat until tPrevText is pHtml
if keepRunning is "false" then exit StripTags
put pHtml into tPrevText
put replacetext(pHtml," +",space) into pHtml
put replacetext(pHtml,"^ ","") into pHtml
end repeat
-----
replace (space & return) with return in pHtml
replace (return & space) with return in pHtml
filter pHtml without empty
-----
replace """ with quote in pHtml
repeat with i = 1 to the number of items of kHtml
replace item i of kHtml with item i of kConvertedHtml in pHtml
end repeat
-----
return pHtml
end StripTags
----------------------- end copy
Jim Ault
Las Vegas
On 4/19/06 12:29 AM, "Bill Marriott" <wjm at wjm.org> wrote:
> Forgive me if this has been "asked and answered" on the list before, but I
> think it's of general enough interest for me to post, in case someone has
> already invented this mousetrap.
>
> I am wondering what the most efficient way might be to convert an HTML table
> into a Rev table.
>
> The ideal solution would
>
> - convert <tr>'s into rows and <td>'s into columns
> - correctly handle (i.e., ignore) all of the various attributes that might
> be embedded within the table tags.
> - designed for data tables (not formatting tables)
> - work very fast
>
> Ideas? Suggestions? Pointers?
>
> - Bill
>
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list