determining a plain text file
Scott Morrow
scott at elementarysoftware.com
Thu Jul 13 16:21:45 EDT 2006
My initial post was concerned with trying to guess whether the file
that the user pointed at was likely to be formatted correctly... and
I was just looking for plain ASCII. I learned even more than expected!
On Jul 10, 2006, at 11:20 AM, Dar Scott wrote:
>
> On Jul 9, 2006, at 12:59 AM, Scott Morrow wrote:
>
>> Does anyone have a method for determining whether a file is plain
>> text that they would be willing to share?
>
> I don't think plain text or not is the right question. How sure do
> you want to be? This can take a lot of processing.
>
> Do you mean plain text vs binary? Plain text vs RTF? Plain text
> ASCII vs plain text UTF-8?
>
> For example: I have a function I use that tries to "guess" the
> Unicode encoding form of a file. My approach is not to ask "is
> this this format?" but "is this more likely this one than the
> others under consideration?". (That gets hard under some perverse
> cases of UTF-16BE vs UTF-16LE. Brag: My Unicode recognizer code
> beats my Microsoft programs in encoding guessing.) I have a few
> hard rules to handle the easy cases, but for the most part I build
> up evidence points and then compare.
>
> Also, I don't look at the whole file (except in some special
> cases). I look at only the characters near the end and near the
> front. That puts an upper bound on determination time.
>
>
> Is the question "Should I dump this into a field or should I
> convert to hex first?" ?
>
> Dar Scott
More information about the use-livecode
mailing list