Guessing the encoding of a test file...
Paul Dupuis
paul at researchware.com
Fri Mar 20 14:06:32 EDT 2020
On 3/20/2020 1:44 PM, Richard Gaskin via use-livecode wrote:
> I would be interested to learn more about the details of the
> subsequent refinements over the decade since, but also the ROI
> proposition for today:
I'll try to remember to share the current code after this current
review. I'm happy to put it out there for others who may need something.
It adds a few more statistical samplings for MacRoman vs CP1252/Latin 1
over your excellent original routine that catches a few more correct
guesses.
As for the diminishing returns and ROI for today, I am not sure there is
any sort of general ROI for further enhancing the current routine. It
does just about every best practice for detection there is (to the best
of my knowledge). That said, the current case is of a researcher with a
edge variant who happens to be a long time customer AND has a *LOT* of
text file that should come up as MacRoman but were not. With one more
tweak (a tiny bug of a mistypes variable name) they now do detect correctly.
If the customer wasn't a long time customer and someone with lots of
data with this problem, I probably would not invest this level of effort.
More information about the use-livecode
mailing list