HTML to text in field
David V Glasgow
dvglasgow at gmail.com
Thu Aug 9 08:00:40 EDT 2018
Hello folks,
I am having an interesting time (MacOS 10.13.5 LC 8.1.9) trying to load some HTML files (≤ 5 ish MB). Most of them will be lists or tables, generated by various users on various systems.
I don’t want to retain any of the formatting, except line endings, so I would be happy for tables to appear as lists. I found a little 2013 nugget from the estimable Jacqueline Landman Gay
set the htmltext of the templatefield to htmlVar -- variable contains the html string
put the text of the templatefield into tPlainText
In some cases that works fine, but in others, it seems that HTML tables consisting of maybe 20-30 thousand rows are rendered onto a single line of the field. A sort of black-letters-overwritten splodge appears in the first row and LC cranks up to 100% of the processor and BBoD ensues.
Sometimes it never seems to recover, but other times it hands back control after maybe 20 minutes or so, and in those cases I can see the text if I set dontwrap to false. It contains no line endings from the original table, and a shedload of tabs.
I have tried to operate on the HTML string in a variable before putting it into the field, but frankly don’t really know what property of some HTML tables might mean that line endings are lost. I can only see </tr> when I examine the files in an editor.
I tried a different approach, replacing a row end with a cr, and then stripping out tags:
put URL ("file:" & theFilePath) into ttemp
replace "</tr>" with cr in ttemp
replaceText (ttemp, "<*>", "|")
filter lines of ttemp without empty
set the text of field "import" to ttemp
The replaceText line generates an error “button "Import HTML": execution error at line 7 (Handler: can't find handler) near "replaceText", char 1”
Firstly I don’t get the error, and secondly I am worried I may be over complicating something which should be simple.
Advice please!
Best wishes,
David Glasgow
More information about the use-livecode
mailing list