Detecting UTF-8 Encoded Files
Ken Ray
kray at sonsothunder.com
Fri Aug 7 09:26:28 EDT 2009
I recently had a need to be able to detect whether a vCard was UTF-8 encoded
or not so that I could run the proper decoding on it... after a healthy web
search, I found an article on Instructables for how to walk through the text
of a file and be able to determine this:
http://www.instructables.com/id/SYGL47RFDYPTCVC/
I wrote a function based on it and so far it's worked for me, but if anyone
sees any problems with it, let me know:
function GetFileData
answer file "Select a file:"
put it into tFile
if tFile is not "" then
if isUTF8Encoded(tFile) then
put url ("file:" & tFile) into tData
return unidecode(uniencode(tData,"utf8"))
else
return tdata
end if
end if
end GetFileData
function isUTF8Encoded pPath
put url ("file:" & pPath) into tData
-- Look for patterns of:
-- "110xxxxx, 10yyyyyy" (2 bytes)
-- "1110xxxx, 10yyyyyy, 10zzzzzz" (3 bytes)
-- "11110xxx,10yyyyyy, 10zzzzzz, 10wwwwww" (4 bytes)
put "" into tMatchHolder
repeat for each char tChar in tData
put format("%08d",baseConvert(charToNum(tChar),10,2)) into tVal
if tMatchHolder = "" then
switch
case (char 1 to 3 of tVal = "110")
put "20" into tMatchHolder
break
case (char 1 to 4 of tVal = "1110")
put "30" into tMatchHolder
break
case (char 1 to 5 of tVal = "11110")
put "40" into tMatchHolder
break
default
next repeat
end switch
else
if (char 1 to 2 of tVal = "10") then
if char 2 of tMatchHolder = (char 1 of tMatchHolder - 2) then
return "true"
else
add 1 to char 2 of tMatchHolder
end if
else
put "" into tMatchHolder
next repeat
end if
end if
end repeat
return "false"
end isUTF8Encoded
HTH,
Ken Ray
Sons of Thunder Software, Inc.
Email: kray at sonsothunder.com
Web Site: http://www.sonsothunder.com/
More information about the use-livecode
mailing list