determining a plain text file
Scott Morrow
scott at elementarysoftware.com
Sun Jul 9 21:12:17 EDT 2006
Thanks all, That is a nice list of options.
I couldn't think of a "simple" way and you've confirmed my
suspicions. : )
-Scott
On Jul 9, 2006, at 5:45 PM, Cubist at aol.com wrote:
> In a message dated 7/9/06 11:38:23 AM,
> <scott at elementarysoftware.com> writes:
>> Does anyone have a method for determining whether a file is plain
>> text that they would be willing to share?
> This is not a simple question to answer. Consider that a *web
> page* is
> plain text -- what makes it a "web page" is what a browser does
> with/to it when
> you run across it in the course of your websurfing. So perhaps it
> might be
> appropriate for you to explain what *you* mean when you say "plain
> text"?
> Depending on your definition of "plain text", the method of
> detecting it may well
> vary...
> That said, here's a couple of possible methods which, even if
> they don't
> do what you want, may help set you on the right road to finding
> your answer...
>
> # possible answer 1: what's the file extension?
> function IsItText1 TheFilename
> # all we care about here is the *name* of the file
>
> set the itemDelimiter to "."
> put item -1 of TheFilename into Fred
> # "text" and "txt" are the most common extensions denoting
> # text files; if you know of any others, you can add them in, too
> put "text,txt" into TextExtensions
> repeat for each item ThisExt in TextExtensions
> if Fred = ThisExt then return true
> end repeat
> return false
> end IsItText2
>
> # possible answer 2: does the file contain weird characters?
> function IsItText2 TheText
> # assumes that you've already read the file from disc,
> # and are fiddling with the file's content
>
> put the length of TheText into OldLength
> # garden-variety ASCII text only has characters in it whose
> # ASCII code numbers are 127 *or less*. thus, if there's
> # anything in there with an ASCII code number *greater than 127*,
> # it's prolly not "plain text"
> repeat with K1 = 127 to 255
> put numToChar (K1) into BadChar
> replace BadChar with "" in TheText
> end repeat
> put the length of TheText into NewLength
> return (OldLength = NewLength)
> # if OldLength is the same as NewLength, this will return "true";
> # otherwise, it returns "false". since the only way NewLength *can*
> # be different from OldLength is if some characters got nuked
> # in the loop, you'll get The Right Answer here
> end IsItText2
>
> Neither of these functions is perfect; both of them can be
> fooled, whether
> by intent or by accident. Suppose some joker slapped the name
> "Budget2006.txt" onto an Excel spreadsheet file, for instance; the
> IsItText1 function above
> would say "Yes, it's a text file, alright", but IsItText2 would
> *not* be so
> fooled. As for IsItText2, *that* function will turn up bits nose at
> any file
> which contains curly-quotes rather than straight-quotes, which
> means that yes,
> there are genuine, honest-to-God *text files* which IsItText2 will
> *wrongly* deem
> "not plain text".
> Again, once you know what *you* consider a "plain text file" to
> be, it'll
> be easier to come up with a solution.
>
> Hope this helps...
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
More information about the use-livecode
mailing list