Detecting UTF-8 Encoded Files

Ken Ray kray at sonsothunder.com
Fri Aug 7 09:26:28 EDT 2009


I recently had a need to be able to detect whether a vCard was UTF-8 encoded
or not so that I could run the proper decoding on it... after a healthy web
search, I found an article on Instructables for how to walk through the text
of a file and be able to determine this:

    http://www.instructables.com/id/SYGL47RFDYPTCVC/

I wrote a function based on it and so far it's worked for me, but if anyone
sees any problems with it, let me know:


function GetFileData
    answer file "Select a file:"
    put it into tFile
    if tFile is not "" then
        if isUTF8Encoded(tFile) then
            put url ("file:" & tFile) into tData
            return unidecode(uniencode(tData,"utf8"))
        else
            return tdata
        end if
    end if
end GetFileData

function isUTF8Encoded pPath
  put url ("file:" & pPath) into tData
  
  -- Look for patterns of:
  --     "110xxxxx, 10yyyyyy" (2 bytes)
  --     "1110xxxx, 10yyyyyy, 10zzzzzz" (3 bytes)
  --     "11110xxx,10yyyyyy, 10zzzzzz, 10wwwwww" (4 bytes)
  put "" into tMatchHolder
  repeat for each char tChar in tData
    put format("%08d",baseConvert(charToNum(tChar),10,2)) into tVal
    if tMatchHolder = "" then
      switch
      case (char 1 to 3 of tVal = "110")
        put "20" into tMatchHolder
        break
      case (char 1 to 4 of tVal = "1110")
        put "30" into tMatchHolder
        break
      case (char 1 to 5 of tVal = "11110")
        put "40" into tMatchHolder
        break
      default
        next repeat
      end switch
    else
      if (char 1 to 2 of tVal = "10") then
        if char 2 of tMatchHolder = (char 1 of tMatchHolder - 2) then
          return "true"
        else
          add 1 to char 2 of tMatchHolder
        end if
      else
        put "" into tMatchHolder
        next repeat
      end if
    end if
  end repeat
  return "false"
end isUTF8Encoded

HTH,

Ken Ray
Sons of Thunder Software, Inc.
Email: kray at sonsothunder.com
Web Site: http://www.sonsothunder.com/





More information about the use-livecode mailing list