determining a plain text file

Scott Morrow scott at elementarysoftware.com
Thu Jul 13 16:21:45 EDT 2006


My initial post was concerned with trying to guess whether the file  
that the user pointed at was likely to be formatted correctly... and  
I was just looking for plain ASCII.  I learned even more than expected!

On Jul 10, 2006, at 11:20 AM, Dar Scott wrote:
>
> On Jul 9, 2006, at 12:59 AM, Scott Morrow wrote:
>
>> Does anyone have a method for determining whether a file is plain  
>> text that they would be willing to share?
>
> I don't think plain text or not is the right question.  How sure do  
> you want to be?  This can take a lot of processing.
>
> Do you mean plain text vs binary?  Plain text vs RTF?  Plain text  
> ASCII vs plain text UTF-8?
>
> For example:  I have a function I use that tries to "guess" the  
> Unicode encoding form of a file.  My approach is not to ask "is  
> this this format?" but "is this more likely this one than the  
> others under consideration?".  (That gets hard under some perverse  
> cases of UTF-16BE vs UTF-16LE.  Brag:  My Unicode recognizer code  
> beats my Microsoft programs in encoding guessing.)  I have a few  
> hard rules to handle the easy cases, but for the most part I build  
> up evidence points and then compare.
>
> Also, I don't look at the whole file (except in some special  
> cases).  I look at only the characters near the end and near the  
> front.  That puts an upper bound on determination time.
>
>
> Is the question "Should I dump this into a field or should I  
> convert to hex first?" ?
>
> Dar Scott



More information about the use-livecode mailing list