Spurious characters from html files - text encoding issues?
keith.clarke at me.com
Mon May 17 04:28:30 EDT 2021
I’m using LiveCode to summarise text from HTML documents into csv summary files and am noticing that when I extract strings from html documents stored on disk - rather than visiting the sites via the browser widget & grabbing the HTML text - weird characters being inserted in place of what appear to be ‘regular’ characters.
The number of characters inserted can run into the thousands per instance, making my csv ‘summary’ file run into gigabytes! Has anyone seen the following type of string before, happen to know what might be causing it and offer a fix?
I’ve tried deliberately setting UTF-8 on the extracted strings, with put textEncode(tString, "UTF-8") into tString. Currently I’m not attempting to force any text format on the local HTML documents.
Thanks & regards,
More information about the use-livecode