Another empirical speed test

Cubist at aol.com Cubist at aol.com
Sat Mar 12 02:30:36 EST 2005


   What I've got to say here isn't new, but I think it's worth bringing it up 
again for the benefit of those who joined the Revolution *after* the last 
time it was mentioned here...

   In the project I'm working on now, I want to take a generic CSV file and 
split it up into its component columns -- for instance, a CSV file whose lines 
fit the pattern "a,b,c,d,e" would be five columns. Sadly, there isn't any 
built-in function that does this, so I'll have to roll my own. And as you can well 
imagine, speed is an issue. So I worked up a couple of different methods, and 
clocked them, and I'm going to present the results (which, as I stated above, 
will not surprise any of the 'old hands').

   Details of the setup: The test data was 5,175 lines of Apple stock-price 
data, from 1984 to the present. All tests conducted on a 400-MHz G3 'pismo' 
PowerBook running MacOS 9.1, with 320 MB of RAM. Faster machines will of course 
yield faster times, but your relative rankings (i.e., "*this* is X% faster than 
*that*") should be about the same as mine.

  # Test 1 code
  repeat with K1 = 1 to the number of lines in DerData
    put item 3 of line K1 of DerData into line K1 of DerData
  end repeat
  # Time: 18 seconds

  # Test 2 code
  put "" into Rezult
  repeat (the number of lines in DerData)
    put (item 3 of line 1 of DerData) & return before Rezult
    delete line 1 of DerData
  end repeat
  # Time: 6.7 seconds

  # Test 3 code
  put "" into Rezult
  repeat for each line LL in DerData
    put return & (item 3 of LL) after Rezult
  end repeat
  # !!! !!! !!! -- Time: < .03 seconds -- !!! !!! !!!

   Yes, Virginia, "repeat for each" is about THREE BLEEDING ORDERS OF 
MAGNITUDE faster than "repeat with K1"! It's true that "repeat for each" doesn't give 
you a counter, as "repeat with K1" does... but with this kind of speed 
difference, you can afford to roll your own and slip an "add 1 to MyCounter" into 
your loop, right?

   One more thing: I quoted the "repeat for each" time as "< .03" because I 
got slightly different timings when I tried it with various item-numbers. For 
the record:

  item 2 of LL, .024 seconds
  item 5 of LL, .027 seconds
  item 7 of LL, .029 seconds

   My test-data had only 7 items per line, so I wondered if it would make any 
difference whether I used "item 7" or "item -1". It did, like so:

  item -1 of LL, .036 seconds

   In other words, there's a 24% difference. It would appear that counting 
backwards carries a nontrivial overhead.
   You may now return to your normal programming...


More information about the use-livecode mailing list