determining a plain text file

Scott Morrow scott at elementarysoftware.com
Sun Jul 9 21:12:17 EDT 2006


Thanks all,  That is a nice list of options.
I couldn't think of a "simple" way and you've confirmed my  
suspicions.  :  )
-Scott
On Jul 9, 2006, at 5:45 PM, Cubist at aol.com wrote:

> In a message dated 7/9/06 11:38:23 AM,  
> <scott at elementarysoftware.com> writes:
>> Does anyone have a method for determining whether a file is plain
>> text that they would be willing to share?
>    This is not a simple question to answer. Consider that a *web  
> page* is
> plain text -- what makes it a "web page" is what a browser does  
> with/to it when
> you run across it in the course of your websurfing. So perhaps it  
> might be
> appropriate for you to explain what *you* mean when you say "plain  
> text"?
> Depending on your definition of "plain text", the method of  
> detecting it may well
> vary...
>    That said, here's a couple of possible methods which, even if  
> they don't
> do what you want, may help set you on the right road to finding  
> your answer...
>
> # possible answer 1: what's the file extension?
> function IsItText1 TheFilename
>   # all we care about here is the *name* of the file
>
>   set the itemDelimiter to "."
>   put item -1 of TheFilename into Fred
>   # "text" and "txt" are the most common extensions denoting
>   # text files; if you know of any others, you can add them in, too
>   put "text,txt" into TextExtensions
>   repeat for each item ThisExt in TextExtensions
>     if Fred = ThisExt then return true
>   end repeat
>   return false
> end IsItText2
>
> # possible answer 2: does the file contain weird characters?
> function IsItText2 TheText
>   # assumes that you've already read the file from disc,
>   # and are fiddling with the file's content
>
>   put the length of TheText into OldLength
>   # garden-variety ASCII text only has characters in it whose
>   # ASCII code numbers are 127 *or less*. thus, if there's
>   # anything in there with an ASCII code number *greater than 127*,
>   # it's prolly not "plain text"
>   repeat with K1 = 127 to 255
>     put numToChar (K1) into BadChar
>     replace BadChar with "" in TheText
>   end repeat
>   put the length of TheText into NewLength
>   return (OldLength = NewLength)
>   # if OldLength is the same as NewLength, this will return "true";
>   # otherwise, it returns "false". since the only way NewLength *can*
>   # be different from OldLength is if some characters got nuked
>   # in the loop, you'll get The Right Answer here
> end IsItText2
>
>    Neither of these functions is perfect; both of them can be  
> fooled, whether
> by intent or by accident. Suppose some joker slapped the name
> "Budget2006.txt" onto an Excel spreadsheet file, for instance; the  
> IsItText1 function above
> would say "Yes, it's a text file, alright", but IsItText2 would  
> *not* be so
> fooled. As for IsItText2, *that* function will turn up bits nose at  
> any file
> which contains curly-quotes rather than straight-quotes, which  
> means that yes,
> there are genuine, honest-to-God *text files* which IsItText2 will  
> *wrongly* deem
> "not plain text".
>    Again, once you know what *you* consider a "plain text file" to  
> be, it'll
> be easier to come up with a solution.
>
>    Hope this helps...
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>




More information about the use-livecode mailing list