Guessing the encoding of a test file...

Paul Dupuis paul at researchware.com
Thu Mar 19 16:31:36 EDT 2020


This has come up many times before, but I'll ask once again in case 
something has changed or someone new sees this.


Does anyone have a routine that will take a filespec to a text file and 
return the guessed encoding of the text file?


First, please don't respond with your should know the encoding or the 
users should know the encoding of their files. Not possible in the 
widely uncontrolled real world.

I do already have a routine to guess file encodings. It was written by 
someone else. There are instances where it should work and does not. I 
fear there may be errors in the algorithm and I do not have the original 
algorithm to check it against. Hence, I am looking for an alternative 
that is either free to use or to be licensed for a modest fee.

My current routine attempts to return the encoding as a string that can 
be directly passed to textDecode(binaryData,encoding)

"ASCII"
"UTF-16"
"UTF-16BE"
"UTF-16LE"
"UTF-32"
"UTF-32BE"
"UTF-32LE"
"UTF-8"
"CP1252" *
"MacRoman" *

* for these last 2, if the file is MacRoman on a Windows system, you 
actually have to textDecode(macToISO(data),"CP1252") and if you have 
CP1252 on the Mac, you need to do textDecode(isoToMac(data),"MacRoman"). 
There is an enhancement request to support MacRoman decoding under 
WIndows and vice versa at 
https://quality.livecode.com/show_bug.cgi?id=22391 if you want to CC 
yourself to show interest.





More information about the use-livecode mailing list