HTML Tables

Jim Ault JimAultWins at yahoo.com
Wed Apr 19 10:08:19 EDT 2006


Here is one that used some RegEx that I sent to Eric Chatonet and he ramped
up to a real code function for removing tags.  Note that a couple steps are
commented out since the exact page you are trying to parse may or may not
need different treatment for spaces and returns.

Try this way, then tweak the "--replace" lines to see if that gives a better
result.
-------------------- start copy here
function StripTags pHtml
  local tRegex,tPrevText
  constant kHtml = 
"é,à,ç,>,<,ecirc;,è,©,•,',&m
iddot;,&"
  constant kConvertedHtml = "é,à,ç,>,<,ê,è,©"
  put kConvertedHtml into tempp
  put "," & numtochar(165) & "," & numtochar(39) & "," & numtochar(225) &
"," & numtochar(38) after tempp
  -----
  --replace return with space in pHtml
  --replace return with "MMMM" in pHtml
  
  replace numtochar(13) with empty in pHtml
  replace tab with empty in pHtml
  -----
  put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
  put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
  put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
  -----
  replace " " with space in pHtml
  replace "<BR>" with return in pHtml
  replace "<p>" with return in pHtml
  -----
  put  "<[^><]*>" into tRegex
  put replacetext(pHtml,tRegex,"") into pHtml
  put replacetext(pHtml,tRegex,"") into pHtml
  -----
  repeat until tPrevText is pHtml
    if keepRunning is "false" then exit StripTags
    put pHtml into tPrevText
    put replacetext(pHtml," +",space) into pHtml
    put replacetext(pHtml,"^ ","") into pHtml
  end repeat
  -----
  replace (space & return) with return in pHtml
  replace (return & space) with return in pHtml
  filter pHtml without empty
  -----
  replace """ with quote in pHtml
  repeat with i = 1 to the number of items of kHtml
    replace item i of kHtml with item i of kConvertedHtml in pHtml
  end repeat
  -----
  return pHtml
end StripTags
----------------------- end copy

Jim Ault
Las Vegas

On 4/19/06 12:29 AM, "Bill Marriott" <wjm at wjm.org> wrote:

> Forgive me if this has been "asked and answered" on the list before, but I
> think it's of general enough interest for me to post, in case someone has
> already invented this mousetrap.
> 
> I am wondering what the most efficient way might be to convert an HTML table
> into a Rev table.
> 
> The ideal solution would
> 
> - convert <tr>'s into rows and <td>'s into columns
> - correctly handle (i.e., ignore) all of the various attributes that might
> be embedded within the table tags.
> - designed for data tables (not formatting tables)
> - work very fast
> 
> Ideas? Suggestions? Pointers?
> 
> - Bill 
> 
> 
> 
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution





More information about the use-livecode mailing list