Detecting Unicode text

Sarah Reichelt sarah.reichelt at gmail.com
Thu Nov 17 20:37:11 EST 2005


Hi All,

I have a utility that takes a text file saved by Apple's Mail, reads
it and processes it. The problem is that if there are any weird
characters in the original email, Mail saves it as UTF-16 (I think).
This opens fine in TextEdit, but in Rev, I get the characters all
spaced apart. e.g. instead of the email containing "Subject:", it
contains " S u b j e c t :  " which completely messes up all
subsequent processing.

If I UniDecode the text, it comes good except for a weird character at
the start which I can handle, but is there a neat way to detect the
encoding of text before I start? I suppose I can just look for the
word "Subject" and if it isn't there, uniDecode and try again, but it
seems there should be a way to detect the encoding of the text itself.

Does the weird stuff at the start give me any clues? Checking the
ASCII codes, the text starts with ASCII 254, ASCII 255, space and then
the first character of my text. Perhaps that's my answer, but will
they always be 254 & 255 or does that vary with the encoding?

Any ideas?

TIA,
Sarah



More information about the use-livecode mailing list