UTF8 to unicode

Mark Smith mark at maseurope.net
Wed Feb 21 13:05:11 EST 2007


Dear all, in a current project, I have to deal with many strings,  
some of which are iso-8559-1, and some of which are various flavours  
of unicode. I've taken the good advice of the list and I store all of  
these strings as UTF8 for internal use, but now I have another problem.

The spec for what I'm doing (an ID3 tagging library), requires that  
some of the strings to be written out into a tag must be iso 8559-1,  
and some may be either iso 8559-1 or UTF16...so my question is:

Given any UTF8 string, can it be determined whether the string can be  
properly represented as iso 8559-1 (single byte chars) or whether  
UTF16 (double byte chars) is needed?

I could simply save all strings that the spec allows as UTF16, but  
this is likely to produce considerably larger tags, and would be  
rather against the spirit of the spec, which explicitly aims to be  
'bye-efficient'.

Any thoughts on this gratefully recieved.

Best,

Mark



More information about the use-livecode mailing list