Unicode in variables

Richmond richmondmathewson at gmail.com
Mon Aug 19 15:18:48 EDT 2013


On 08/19/2013 10:03 PM, J. Landman Gay wrote:
> I need to read and process a tab-delimited text file that is in UTF8 
> format containing unicode. The final goal is to get it into an array 
> with the first tabbed item as the keys, preserving all unicode. There 
> are some HTML format tags in it as well.
>
> If I read the file as binfile, carriage returns are all lost.
>
> Reading it as "file:" and trying to simply convert the entire text 
> block has failed (and crashed) more ways than I can record here.
>
> How do I get hundreds of lines of unicode text into an array?
>

At the risk of being way off base and totally goofy . . .

How about running through the UTF8 text and doing a search and replace?

The Happy Internet tells me that the END OF LINE char is  UTF 16 U+000A, 
UTF 8 0x0A or Decimal 10.

CR in UTF is U+000D, UTF 8 0x0D or Decimal 13.

Merry ASCII is also 10 and/or 13 (a pain in the 'btm').

http://en.wikipedia.org/wiki/Newline

Richmond.




More information about the use-livecode mailing list