convert to lower ascii 128?
Ben Rubinstein
benr at cogapp.com
Thu Feb 5 14:45:55 EST 2009
From the dept of "Please don't let my wife or boss know that I've been
wasting time on this"....
Chipp Walters wrote:
> Interesting note: I found the following results:
> function altAsciiScrub2 pText
> put replacetext(pText,"[" & numToChar(129) & "-" & numToChar(255) &
> "]","") into pText
> put replacetext(pText,"[" & numToChar(1) & "-" & numToChar(31) & "]","")
> into pText
> return pText
> end altAsciiScrub2
>
> executed in 19 ticks on my QuadCore Vista 64 machine
>
> function altAsciiScrub1 pText
> repeat for each char c in pText
> get charToNum(c)
> if it > 128 or it < 32 then
> next repeat
> end if
> put c after t
> end repeat
> return t
> end altAsciiScrub1
>
> executed in 17 ticks on my QuadCore Vista 64 machine
>
> repeat for each is really fast.
Looking this I couldn't help wondering "does it make a difference if there are
a lot of high-code characters to delete?" and "regex is a bit slower: but does
the setup of regex pay off if the string is long enough"? (Chipp didn't
specify what kind of input he wanted to work on, or tested with.)
And I also wondered, "what other ways could we trade off setup over a long
repeat"?
(BTW strictly speaking ASCII is 0-127; both the routines above are allowing
128. I only mention because it took me a while to figure out why my routines
sometimes returned different results to the above two; the difference depended
on whether there as a character with code 128 in the test string.)
So I implemented another couple of options: one taking Devin's suggestion of
calling replace with the characters you don't want: thus only calling
numtochar/chartonum a fixed number of times, regardless of the length of time.
function asciiScrub3 pText
set the caseSensitive to true
-- delete any characters below space
repeat with i = 0 to 31
replace numtochar(i) with empty in pText
end repeat
-- delete any characters above ASCII
repeat with i = 128 to 255
replace numtochar(i) with empty in pText
end repeat
return pText
end asciiScrub3
As you'd expect, this is much slower than the above approaches for a short
string; but it is slightly faster than either of them for a large string -
more so if there are a lot of high-code characters in the string.
In my work, I'm often dealing with non-ASCII characters; but it rarely
suffices to delete them, I generally need to convert them to something else
(either the relevant ASCII character, or to another character set). The
built-in functions in Rev are often frustratingly just off the mark for this,
so I tend to just run the characters through an array, blessing each time I do
so how fast repeat for each is, and how fast arrays are. So I naturally
wondered how that approach would work even when I just wanted to delete the
characters over 128:
function asciiScrub4 pText
set the caseSensitive to true
-- set up array to map characters we want to retain to themselves
put empty into a
repeat with i = 32 to 127
get numtochar(i)
put it into a[it]
end repeat
-- filter the string through the array set up above
put empty into t
repeat for each char c in pText
put a[c] after t
end repeat
return t
end asciiScrub4
Again, as you'd expect, the setup time costs on a short string. But on large
blocks of text, this turned out to be the fastest method - not quite twice as
fast as the original two, but getting there.
As for composition, it makes less of a difference than I thought; all
functions are slightly faster if the source string contains more high-code
characters (ie if the output string is shorter); there doesn't seem to be a
very significant difference between how routines are affected by this.
Mostly what I demonstrated was that all the approaches are so fast that
especially on a short string it's hard to get any significance in the relative
timings - you'd have to be doing a vast amount of processing to justify
spending any time doing better than whatever the first approach you came up
with was. And that sometimes I'll do almost anything to avoid the work I
should be getting on with...
- Ben
More information about the use-livecode
mailing list