Detecting UTF-8 Encoded Files
Klaus Major
klaus at major.on-rev.com
Fri Aug 7 10:01:13 EDT 2009
Hi Ken,
I do not see any problem (and wouldn't if there were ;-)
but Mark Waddingham once helped me out with a working function exactly
for determining
how a VCARD is encoded!
Here it is including Marks (very helpful)comments:
# vCards are stored as a text file, however, the text encoding used
varies depending on the program that exported them.
# We use the following heuristic to detect encoding:
# 1) If there is the byte order mark 0xFEFF then we assume UTF-16BE
# 2) If there is the byte order mark 0xFFFE then we assume UTF-16LE
# 3) If the first byte is 0x00 then we assume UTF-16BE (compatibility
with Tiger Address Book)
# 4) Otherwise we assume UTF-8
function vcf_convert3format tBinaryVCard
# First load the vCard as binary data - at this stage we don't know
the text encoding of the file and loading
# as text would cause inappropriate line ending conversion.
# This variable will hold the vCard encoded in MacRoman (the
default text encoding Revolution uses on Mac OS X)
local tNativeVCard
# We now do our checks to detect text encoding
switch
case charToNum(char 1 of tBinaryVCard) = 0
put "UTF16BE" into tTextEncoding
break
case charToNum(char 1 of tBinaryVCard) = 0xFE and charToNum(char 2
of tBinaryVCard) = 0xFF
delete char 1 to 2 of tBinaryVCard
put "UTF16BE" into tTextEncoding
break
case charToNum(char 1 of tBinaryVCard) = 0xFF and charToNum(char 2
of tBinaryVCard) = 0xFE
delete char 1 to 2 of tBinaryVCard
put "UTF16LE" into tTextEncoding
break
default
put "UTF8" into tTextEncoding
break
end switch
if tTextEncoding begins with "UTF16" then
# Work out the processors byte order
local tHostByteOrder
if the processor is "x86" then
put "LE" into tHostByteOrder
else
put "BE" into tHostByteOrder
end if
# If the byte orders don't match, switch the order of pairs of
bytes
if char -2 to -1 of tTextEncoding <> tHostByteOrder then
put swapbytes(tBinaryVCard) into tBinaryVCard
end if
# Decode the UTF-16 to native
put uniDecode(tBinaryVCard) into tNativeVCard
else
# Use the standard uniDecode/uniEncode pair to decode the UTF-8
encoding
put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard
end if
# We now need to normalize line endings to make sure all lines
terminate in 'return' (numToChar(10)).
put tNativeVCard into tTextVCard
# First replace Windows CR-LF style endings
replace numToChar(13) & numToChar(10) with return in tTextVCard
# Now replace Mac OS CR style endings
replace numToChar(13) with return in tTextVCard
return mac2win(tTextVCard)
end vcf_convert3format
***
Here is my function "mac2win" that we use in our crossplatform project
werhe we store EVERYTHING in ISO format!
function mac2win was
if the platform = "MacOS" then
return mactoiso(was)
else
return was
end if
end mac2win
Hope that helps!
Best
Klaus
--
Klaus Major
http://www.major-k.de
klaus at major.on-rev.com
More information about the use-livecode
mailing list