Counting Chars By ASCII Part 2

Mark Smith mark at maseurope.net
Wed Mar 1 11:57:28 EST 2006


Exactly why I wondered...

from the Docs:

<If the useUnicode property is set to true, the numToChar function  
returns a double-byte character. If the useUnicode is false and you  
specify an ASCIIValue greater than 255, the numToChar function  
returns the character corresponding to the ASCIIValue mod 256.>

So if we're dealing with unicode text, and really need to count the  
instances of each character outside of the 0 - 255 range, then we've  
got a real lot of tests to do since we have to consider values up to  
65535...

my 0 - 255 test took a couple of seconds on about 3 megabytes of  
data, testing for only 27 characters...Todd may be taking some  
longish coffee breaks!

Mark

On 1 Mar 2006, at 15:34, Jim Ault wrote:

>
> On 3/1/06 7:20 AM, "Todd Geist" <tg.lists at geistinteractive.com> wrote:
>
> Question 1:
>>          IF (tASCII < 31 OR tASCII > 255) THEN
> Why would you test for > 255 since no ASCII would be higher than this?
>
> Question 2:
> Are you trying to strip the characters, or just count them and  
> report the
> result, like a histogram?
>
> Could you show exactly what you are starting with and what you want  
> to end
> up with?
>
> Thanks.
>
> Jim Ault
> Las Vegas
>
> On 3/1/06 7:20 AM, "Todd Geist" <tg.lists at geistinteractive.com> wrote:
>
>> Hello Again,
>>
>> After trying several of the excellent suggestions from all you
>> revolutionaries, I realized I hadn't quite explained myself... go
>> figure.  So here is another attempt to explain what I am after.
>>
>> I am actually after "low" ASCII and "High" ASCII characters that my
>> have snuck into a text file. So I need to look at every character,
>> but I don't need to count every character.  I just want the ones that
>> have ASCII values below 32 and above 255 and that are not in a small
>> set of allowed control characters.
>>
>> Based on the suggestions I got on the other thread, I came up with
>> the following that produces the results I am after.  SPEED is
>> critical here, since the files I am scanning maybe many mbs. I am
>> wondering if any of you can improve on the design.  I  feel the need,
>> the need for SPEED.  :>)
>>
>> put field 1 into tString
>> put "10 11 12 29" into charsToIgnore
>>
>>      REPEAT for each char tChar in tString
>>          put charToNum(tChar) into tASCII
>>          IF (tASCII < 31 OR tASCII > 255) THEN
>>              IF tASCII is not among the words of charsToIgnore THEN
>>                  add 1 to tCounts[tASCII]
>>              END IF
>>          END IF
>>      END REPEAT
>>      put the keys of tCounts into tChars
>>      sort lines of tChars numeric
>>
>>      REPEAT for each line thisLine in tChars
>>          put thisLine & TAB & tCounts[thisLine] & Return after  
>> newList
>>      END REPEAT
>>
>> put newList into field "Chars"
>>
>> Thanks in Advance
>>
>> Todd
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list