AW: AW: How trim: Bug in RegExp engine
Wouter
wouter.abraham at scarlet.be
Tue Oct 25 20:33:40 EDT 2005
On 25 Oct 2005, at 23:39, Thomas Fischer wrote:
-snip-
> 3. It seems that regular expressions are to be avoided in time
> sensitive parts of the script anyway. Playing around a little bit I
> found that the RegExp solution I suggested took far more time than
> any other solution (by about a factor of 10 compared with the
> fastest solution). Probably this should be optimized in an updated
> version. It seems that regular expressions in Perl are by a factor
> 6 faster (and again 8 times faster on my PC laptop).
> On the other hand, this shows that those cumbersome repeat loops
> are surprisingly fast.
> The fastest is:
>
> while char 1 of testString is space
> delete char 1 of testString
> end repeat
>
> taking about 1.8 microseconds per round (11 ticks for 100000
> repeats), and with a string whiteSpace = tab && return a loop with
> while char 1 of testString is in whiteSpace
> takes about twice as long.
>
> word 1 to -1 of testString
> removing whitespace at the front and the end simultaneously is only
> a little slower, while using token
> token 1 to -1 of testString
> takes surprisingly three times as long as using "word".
These timing tests are not completely fair because:
while char 1 of testString is space -> removes only space from
front if any
delete char 1 of testString
end repeat
word 1 to -1 of testString -> removes tabs, spaces and returns from
front and back of string if any
token 1 to -1 of testString -> removes tabs, spaces, hard spaces,
and returns from front and back of string if any
You compare time it takes for frontal removal of space if any with
time it takes for frontal and back removal of tab and space if any or
tab, space and hard space (semicolon and return) if any.
To make it more fair, the time testing handlers should be equalized
on the removal of tabs, spaces and hard spaces from front and back of
a string.
On the other hand this gives an indication of which way to use in
what case.
Greetings,
Wouter
PS
for token 1 to -1 of testString -> watch out for semicolon as it will
be treated as a whitespace or itemdelimiter for token.
In the docs is stated that (semicolon), space, return, and tab are
the itemdelimiters for token.
As hard spaces are also removed this listing is not complete and hard
space should be added.
For me it seems kind of weird to the consider (semicolon), space,
(hardspace), return, and tab as "itemdelimiters" for token, because
they are removed as being whitespaces and are not really acting as an
itemdelimiter.
On the other hand tokens themselves are more acting like a special
kind of itemdelimiter.
More information about the use-livecode
mailing list