UTF8 to unicode
Mark Smith
mark at maseurope.net
Wed Feb 21 13:05:11 EST 2007
Dear all, in a current project, I have to deal with many strings,
some of which are iso-8559-1, and some of which are various flavours
of unicode. I've taken the good advice of the list and I store all of
these strings as UTF8 for internal use, but now I have another problem.
The spec for what I'm doing (an ID3 tagging library), requires that
some of the strings to be written out into a tag must be iso 8559-1,
and some may be either iso 8559-1 or UTF16...so my question is:
Given any UTF8 string, can it be determined whether the string can be
properly represented as iso 8559-1 (single byte chars) or whether
UTF16 (double byte chars) is needed?
I could simply save all strings that the spec allows as UTF16, but
this is likely to produce considerably larger tags, and would be
rather against the spirit of the spec, which explicitly aims to be
'bye-efficient'.
Any thoughts on this gratefully recieved.
Best,
Mark
More information about the use-livecode
mailing list