Unicode in variables

Devin Asay devin_asay at byu.edu
Mon Aug 19 15:43:18 EDT 2013


On Aug 19, 2013, at 1:29 PM, J. Landman Gay wrote:

> On 8/19/13 2:15 PM, Devin Asay wrote:
>> 
>> On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote:
>> 
>>> I need to read and process a tab-delimited text file that is in
>>> UTF8 format containing unicode. The final goal is to get it into an
>>> array with the first tabbed item as the keys, preserving all
>>> unicode. There are some HTML format tags in it as well.
>>> 
>>> If I read the file as binfile, carriage returns are all lost.
>> 
>> Jacque,
>> 
>> Where are the files coming from? Maybe they're using ASCII 13 as a
>> line terminator, or ASCII 10 + 13. Can't you replace whatever the
>> native line delimiter is with numToChar(10)?
> 
> I forgot about that. They're ascii 13, and replacing them does keep the line breaks. Thanks.
> 
> When I run uniEncode(tData,"UTF8") on it, the high-ascii characters are in the variable watcher as "+" and an unprintable box. Can I assume the real character is in there? Will it work for text chunking, etc? When I split it into an array, will the keys be intact?

I would do all of the chunking and splitting before you do uniEncode. Think of UTF8 as a reliable storage format, and only convert them when you are ready to display them.

Devin

Devin Asay
Learn to code with LiveCode University
http://university.livecode.com







More information about the use-livecode mailing list