opening txt files
Klaus on-rev
klaus at major.on-rev.com
Thu Jan 17 09:32:20 EST 2013
Hi friends,
Am 16.01.2013 um 18:15 schrieb Nishok Love <nishok.love at virgin.net>:
> ...
> So I'm still looking for a way for LiveCode to spot whether it's opening a file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file header? read from file just gives me the data...
I found an old script that Mark Waddingham supplied in the past when I had some problems
reading VCards in 3.0 format (unicode). I think it can be used to open ANY txt file.
I do not fully understand it, so I leave it uncommented ;-)
In any case it will convert any given text file to Livecode readable plain text.
Comments are from Mark W.
> I could read the file, count the number of characters and how many of them are spaces and from that I could infer which format is being used. Probably this would be reliable for my purposes - just not very elegant!
>
> Nishok
###############################################################################
-- vCards are stored as a text file, however, the text encoding used varies
-- depending on the program that exported them.
--
-- We use the following heuristic to detect encoding:
-- 1) If there is the byte order mark 0xFEFF then we assume UTF-16BE
-- 2) If there is the byte order mark 0xFFFE then we assume UTF-16LE
-- 3) If the first byte is 0x00 then we assume UTF-16BE (compatibility
-- with Tiger Address Book)
-- 4) Otherwise we assume UTF-8
--
function importVCard pFilename
-- First load the vCard as binary data - at this stage we don't know
-- the text encoding of the file and loading as text would cause
-- inappropriate line ending conversion.
local tBinaryVCard
put url ("binfile:" & pFilename) into tBinaryVCard
-- This variable will hold the vCard encoded in MacRoman (the default
-- text encoding Revolution uses on Mac OS X)
local tNativeVCard
-- We now do our checks to detect text encoding
local tTextEncoding
if charToNum(char 1 of tBinaryVCard) is 0 then
put "UTF16BE" into tTextEncoding
else if charToNum(char 1 of tBinaryVCard) is 0xFE and charToNum(char 2 of tBinaryVCard) is 0xFF then
delete char 1 to 2 of tBinaryVCard
put "UTF16BE" into tTextEncoding
else if charToNum(char 1 of tBinaryVCard) is 0xFF and charToNum(char 2 of tBinaryVCard) is 0xFE then
delete char 1 to 2 of tBinaryVCard
put "UTF16LE" into tTextEncoding
else
put "UTF8" into tTextEncoding
end if
if tTextEncoding begins with "UTF16" then
-- Work out the processors byte order
local tHostByteOrder
if the processor is "x86" then
put "LE" into tHostByteOrder
else
put "BE" into tHostByteOrder
end if
-- If the byte orders don't match, switch the order of pairs of bytes
if char -2 to -1 of tTextEncoding is not tHostByteOrder then
repeat with x = 1 to the length of tBinaryVCard step 2
get char x of tBinaryVCard
put char x + 1 of tBinaryVCard into char x of tBinaryVCard
put it into char x + 1 of tBinaryVCard
end repeat
end if
-- Decode the UTF-16 to native
put uniDecode(tBinaryVCard) into tNativeVCard
else
-- Use the standard uniDecode/uniEncode pair to decode the UTF-8 encoding
put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard
end if
-- We now need to normalize line endings to make sure all lines terminate
-- in 'return' (numToChar(10)).
local tTextVCard
put tNativeVCard into tTextVCard
-- First replace Windows CR-LF style endings
replace numToChar(13) & numToChar(10) with return in tTextVCard
-- Now replace Mac OS CR style endings
replace numToChar(13) with return in tTextVCard
return tTextVCard
end importVCard
-- The Tiger version of Apple Address Book (4.0.4) exports vCard files
-- as UTF-16 big endian without a BOM if the record contains any non-ASCII
-- characters.
-- If there are non non-ASCII characters, the record is just left as
-- ASCII with no conversion to UTF-16.
-- On Leopard, it seems that Apple Address Book exports vCard files
-- in UTF-8 regardless.
function importAppleAddressVCard pFilename
-- First load the vCard as binary data - at this stage we don't know
-- the text encoding of the file and loading as text would cause
-- inappropriate line ending conversion.
local tBinaryVCard
put url ("binfile:" & pFilename) into tBinaryVCard
-- This variable will hold the vCard encoded in MacRoman (the default
-- text encoding Revolution uses on Mac OS X)
local tNativeVCard
-- Okay so now we have the binary data, we need to decide if it is
-- UTF-16BE or ASCII/UTf-8. This is easy to do since the first character of
-- a vCard has to be an ASCII character. If the record has been encoded
-- as UTF-16BE, then this means this will translate as the first byte
-- being the NUL (0) character.
if charToNum(char 1 of tBinaryVCard) is 0 then
-- We are UTF-16BE
-- We now know that tBinaryVCard is big endian UTF-16 since Revolution
-- only handles host byte order UTF-16 at the moment we must byte-swap
-- on Little Endian platforms
if the processor is "x86" then
repeat with x = 1 to the length of tBinaryVCard step 2
get char x of tBinaryVCard
put char x + 1 of tBinaryVCard into char x of tBinaryVCard
put it into char x + 1 of tBinaryVCard
end repeat
end if
-- We have UTF-16 in host byte order now, so use uniDecode to convert
-- it to MacRoman
put uniDecode(tBinaryVCard) into tNativeVCard
-- We now have MacRoman text, but it still has Mac line endings, so
-- replace CR with return
else
-- We are ASCII or UTF-8. Fortunately, as ASCII is a proper subset of
-- UTF-8 we can just assume we have UTF-8 and convert this to native
-- encoding
put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard
end if
-- We now need to normalize line endings to make sure all lines terminate
-- in 'return' (numToChar(10)).
local tTextVCard
put tNativeVCard into tTextVCard
-- First replace Windows CR-LF style endings
replace numToChar(13) & numToChar(10) with return in tTextVCard
-- Now replace Mac OS CR style endings
replace numToChar(13) with return in tTextVCard
return tTextVCard
end importAppleAddressVCard
###############################################################################
Best
Klaus
--
Klaus Major
http://www.major-k.de
klaus at major.on-rev.com
More information about the use-livecode
mailing list