Getting Kanji from a .csv file

Dar Scott dsc at swcp.com
Fri Jun 7 10:15:38 EDT 2013


Now, you have me worried, Richard.  Maybe I missed something in what the engine does with text files...


On Jun 7, 2013, at 8:11 AM, Dar Scott wrote:

> Yeah, there is no need to use binfile, but it is OK.   You can process the line ends before or after converting to Unicode, if you do.
> 
> Not too cautious for not knowing.  It is a normal and right approach to be aware of potential problems and make code robust for those, but now you know.
> 
> Assuming a valid UTF-8 file...
> 
> Only the ASCII characters in UTF-8 have the high bit zero.  They are represented as single bytes.  (ASCII files are UTF-8 files.)  All other characters are represented with multiple bytes that have the high bit set, not just the first but even the following.  (The first byte in binary is 11xxxxxx and the continuing bytes are 10xxxxxx.)
> 
> This means there are no CR, LF, tab, or comma hidden in the non-ASCII characters.  ASCII never has the high bit set.  You can use line and item chunks with UTF-8.  You can use offset (with care) and replace.
> 
> Now, here is where I'm ignorant.  I am cautious, perhaps overly cautious.  I don't use word or token with UTF-8.  I can never remember how word works, much less token.  Maybe the above is enough for somebody to comment.
> 
> Dar
> 
> 
> On Jun 7, 2013, at 7:39 AM, Richard Gaskin wrote:
> 
>> Dar Scott wrote:
>> 
>>> You can use "file:" with UTF-8.  No ghost ASCII CR or LF will show
>>> up in the representation of any characters other than CR and LF.
>> 
>> Maybe I'm just superstitious, but I've always used "binfile" with Unicode because I didn't expect the engine to understand the difference between bytes used as line endings and those same bytes that may appear as part of a character byte sequence.
>> 
>> Have I been too cautious?
>> 
>> --
>> Richard Gaskin
>> Fourth World
>> LiveCode training and consulting: http://www.fourthworld.com
>> Webzine for LiveCode developers: http://www.LiveCodeJournal.com
>> Follow me on Twitter:  http://twitter.com/FourthWorldSys
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list