Getting Kanji from a .csv file

Richard Gaskin ambassador at fourthworld.com
Fri Jun 7 10:18:08 EDT 2013


Dar Scott wrote:

> Yeah, there is no need to use binfile, but it is OK.   You can process the line ends before or after converting to Unicode, if you do.
>
> Not too cautious for not knowing.  It is a normal and right approach to be aware of potential problems and make code robust for those, but now you know.
>
> Assuming a valid UTF-8 file...
>
> Only the ASCII characters in UTF-8 have the high bit zero.  They are represented as single bytes.  (ASCII files are UTF-8 files.)  All other characters are represented with multiple bytes that have the high bit set, not just the first but even the following.  (The first byte in binary is 11xxxxxx and the continuing bytes are 10xxxxxx.)
>
> This means there are no CR, LF, tab, or comma hidden in the non-ASCII characters.  ASCII never has the high bit set.  You can use line and item chunks with UTF-8.  You can use offset (with care) and replace.

Thanks for that background, Dar.  I had suspected there may have been 
something that makes such distinctions identifiable, but didn't know the 
details.  Now I can use "file" with confidence (and less work handling 
line endings).

Really nice to have you back on this list.

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  Follow me on Twitter:  http://twitter.com/FourthWorldSys




More information about the use-livecode mailing list