Distinguishing between ASCII and UTF8

Richard Gaskin ambassador at fourthworld.com
Wed Oct 6 16:23:24 EDT 2010


I have an app that needs to auto-detect Unicode and plain text, and 
render them correctly based on that auto-detection.

I have the UTF16 stuff working, but with UTF8 I have a problem:  there 
is no BOM to let me know if it's Unicode, and some plain text files will 
occasionally have high-ASCII values in them (like the dagger symbol).

What patterns should I be looking for in the binary data of a file to 
distinguish UTF8 from plain text?

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  LiveCode Journal blog: http://LiveCodejournal.com/blog.irv



More information about the use-livecode mailing list