opening txt files

Klaus on-rev klaus at major.on-rev.com
Thu Jan 17 09:32:20 EST 2013


Hi friends,

Am 16.01.2013 um 18:15 schrieb Nishok Love <nishok.love at virgin.net>:

> ...
> So I'm still looking for a way for LiveCode to spot whether it's opening a file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file header? read from file just gives me the data...

I found an old script that Mark Waddingham supplied in the past when I had some problems
reading VCards in 3.0 format (unicode). I think it can be used to open ANY txt file.

I do not fully understand it, so I leave it uncommented ;-)
In any case it will convert any given text file to Livecode readable plain text.

Comments are from Mark W.

> I could read the file, count the number of characters and how many of them are spaces and from that I could infer which format is being used. Probably this would be reliable for my purposes - just not very elegant!
> 
> Nishok

###############################################################################
-- vCards are stored as a text file, however, the text encoding used varies
-- depending on the program that exported them.
--
-- We use the following heuristic to detect encoding:
--   1) If there is the byte order mark 0xFEFF then we assume UTF-16BE
--   2) If there is the byte order mark 0xFFFE then we assume UTF-16LE
--   3) If the first byte is 0x00 then we assume UTF-16BE (compatibility
--      with Tiger Address Book)
--   4) Otherwise we assume UTF-8
--
function importVCard pFilename
  -- First load the vCard as binary data - at this stage we don't know
  -- the text encoding of the file and loading as text would cause
  -- inappropriate line ending conversion.
  local tBinaryVCard
  put url ("binfile:" & pFilename) into tBinaryVCard
  
  -- This variable will hold the vCard encoded in MacRoman (the default
  -- text encoding Revolution uses on Mac OS X)
  local tNativeVCard
  
  -- We now do our checks to detect text encoding
  local tTextEncoding
  if charToNum(char 1 of tBinaryVCard) is 0 then
    put "UTF16BE" into tTextEncoding
  else if charToNum(char 1 of tBinaryVCard) is 0xFE and charToNum(char 2 of tBinaryVCard) is 0xFF then
    delete char 1 to 2 of tBinaryVCard
    put "UTF16BE" into tTextEncoding
  else if charToNum(char 1 of tBinaryVCard) is 0xFF and charToNum(char 2 of tBinaryVCard) is 0xFE then
    delete char 1 to 2 of tBinaryVCard
    put "UTF16LE" into tTextEncoding
  else
    put "UTF8" into tTextEncoding
  end if
  
  if tTextEncoding begins with "UTF16" then
    -- Work out the processors byte order
    local tHostByteOrder
    if the processor is "x86" then
      put "LE" into tHostByteOrder
    else
      put "BE" into tHostByteOrder
    end if
    
    -- If the byte orders don't match, switch the order of pairs of bytes
    if char -2 to -1 of tTextEncoding is not tHostByteOrder then
      repeat with x = 1 to the length of tBinaryVCard step 2
        get char x of tBinaryVCard
        put char x + 1 of tBinaryVCard into char x of tBinaryVCard
        put it into char x + 1 of tBinaryVCard
      end repeat
    end if
    
    -- Decode the UTF-16 to native
    put uniDecode(tBinaryVCard) into tNativeVCard
  else
    -- Use the standard uniDecode/uniEncode pair to decode the UTF-8 encoding
    put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard
  end if
  
  -- We now need to normalize line endings to make sure all lines terminate
  -- in 'return' (numToChar(10)).
  local tTextVCard
  put tNativeVCard into tTextVCard
  
  -- First replace Windows CR-LF style endings
  replace numToChar(13) & numToChar(10) with return in tTextVCard
  
  -- Now replace Mac OS CR style endings
  replace numToChar(13) with return in tTextVCard
  
  return tTextVCard
end importVCard

-- The Tiger version of Apple Address Book (4.0.4) exports vCard files
-- as UTF-16 big endian without a BOM if the record contains any non-ASCII
-- characters.
-- If there are non non-ASCII characters, the record is just left as
-- ASCII with no conversion to UTF-16.
-- On Leopard, it seems that Apple Address Book exports vCard files
-- in UTF-8 regardless.
function importAppleAddressVCard pFilename
  -- First load the vCard as binary data - at this stage we don't know
  -- the text encoding of the file and loading as text would cause
  -- inappropriate line ending conversion.
  local tBinaryVCard
  put url ("binfile:" & pFilename) into tBinaryVCard
  
  -- This variable will hold the vCard encoded in MacRoman (the default
  -- text encoding Revolution uses on Mac OS X)
  local tNativeVCard
  
  -- Okay so now we have the binary data, we need to decide if it is
  -- UTF-16BE or ASCII/UTf-8. This is easy to do since the first character of
  -- a vCard has to be an ASCII character. If the record has been encoded
  -- as UTF-16BE, then this means this will translate as the first byte
  -- being the NUL (0) character.
  if charToNum(char 1 of tBinaryVCard) is 0 then
    -- We are UTF-16BE
    
    -- We now know that tBinaryVCard is big endian UTF-16 since Revolution
    -- only handles host byte order UTF-16 at the moment we must byte-swap
    -- on Little Endian platforms
    if the processor is "x86" then
      repeat with x = 1 to the length of tBinaryVCard step 2
        get char x of tBinaryVCard
        put char x + 1 of tBinaryVCard into char x of tBinaryVCard
        put it into char x + 1 of tBinaryVCard
      end repeat
    end if
    
    -- We have UTF-16 in host byte order now, so use uniDecode to convert
    -- it to MacRoman
    put uniDecode(tBinaryVCard) into tNativeVCard
    
    -- We now have MacRoman text, but it still has Mac line endings, so
    -- replace CR with return
  else
    -- We are ASCII or UTF-8. Fortunately, as ASCII is a proper subset of
    -- UTF-8 we can just assume we have UTF-8 and convert this to native
    -- encoding
    put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard
  end if
  
  -- We now need to normalize line endings to make sure all lines terminate
  -- in 'return' (numToChar(10)).
  local tTextVCard
  put tNativeVCard into tTextVCard
  
  -- First replace Windows CR-LF style endings
  replace numToChar(13) & numToChar(10) with return in tTextVCard
  
  -- Now replace Mac OS CR style endings
  replace numToChar(13) with return in tTextVCard
  
  return tTextVCard
end importAppleAddressVCard
###############################################################################

Best

Klaus
--
Klaus Major
http://www.major-k.de
klaus at major.on-rev.com





More information about the use-livecode mailing list