AW: AW: How trim: Bug in RegExp engine

Wouter wouter.abraham at scarlet.be
Tue Oct 25 20:33:40 EDT 2005


On 25 Oct 2005, at 23:39, Thomas Fischer wrote:

-snip-

> 3. It seems that regular expressions are to be avoided in time  
> sensitive parts of the script anyway. Playing around a little bit I  
> found that the RegExp solution I suggested took far more time than  
> any other solution (by about a factor of 10 compared with the  
> fastest solution). Probably this should be optimized in an updated  
> version. It seems that regular expressions in Perl are by a factor  
> 6 faster (and again 8 times faster on my PC laptop).
> On the other hand, this shows that those cumbersome repeat loops  
> are surprisingly fast.
> The fastest is:
>
>   while char 1 of testString is space
>     delete char 1 of testString
>   end repeat
>
> taking about 1.8 microseconds per round (11 ticks for 100000  
> repeats), and with a string whiteSpace = tab && return a loop with
>   while char 1 of testString is in whiteSpace
> takes about twice as long.
>
>   word 1 to -1 of testString
> removing whitespace at the front and the end simultaneously is only  
> a little slower, while using token
>   token 1 to -1 of testString
> takes surprisingly three times as long as using "word".

These timing tests are not completely fair because:

   while char 1 of testString is space  ->  removes only space from  
front if any
     delete char 1 of testString
   end repeat

word 1 to -1 of testString   -> removes tabs, spaces and returns from  
front and back of string if any
token 1 to -1 of testString  -> removes tabs, spaces, hard spaces,  
and returns from front and back of string if any

You compare time it takes for frontal removal of space if any with  
time it takes for frontal and back removal of tab and space if any or  
tab, space and hard space (semicolon and return) if any.
To make it more fair, the time testing handlers should be equalized  
on the removal of tabs, spaces and hard spaces from front and back of  
a string.

On the other hand this gives an indication of which way to use in  
what case.

Greetings,
Wouter

PS
for token 1 to -1 of testString -> watch out for semicolon as it will  
be treated as a whitespace or itemdelimiter for token.
In the docs is stated that (semicolon), space, return, and tab are  
the itemdelimiters for token.
As hard spaces are also removed this listing is not complete and hard  
space should be added.

For me it seems kind of weird to the consider (semicolon), space,  
(hardspace), return, and tab as "itemdelimiters" for token, because  
they are removed as being whitespaces and are not really acting as an  
itemdelimiter.
On the other hand tokens themselves are more acting like a special  
kind of itemdelimiter.



More information about the use-livecode mailing list