Binary search : was Re: shilling for my feature request [1926]

Alex Tweedly alex at tweedly.net
Mon Aug 23 11:02:48 EDT 2004


At 09:44 23/08/2004 +0100, Alex Tweedly wrote:
>Now that I'm interested (fatal for me), I'm going to try it myself - so 
>look for a post from me later today with a simple binarysearch function, 
>and some timing results.

Mark, there's good news and bad news. :-)

The good news is that Transcript again amazes me with its great performance 
at chunk expression manipulation; it can iterate through the lines (or 
items) of a variable VERY quickly.

The bad news is that it is (relatively) slow at accessing individual lines 
(or items).

So the benefits of a binary search are counterbalanced by the slow access 
to the individual lines.

Note - for this purpose, you NEED to have your offsets stored as lines, not 
items, because if you try to put 1 million items into a variable you'll run 
into the limit on the number of characters in a line.
(took me a while to spot that one - couldn't figure out why I was creating 
1 million items but there were only about 9500 of them when I was done !)

Here's the test I did :
  - each offset is a value between 1 and 100 million.
  - create a list of 1 million offsets (random integers in that range)
              (this took 9 seconds)
  - sort that list
               (took 40 seconds !!)
  - linear search for the offset immediately below a random offset
              10 times, average of 1.3 seconds
  - binary search for same
               10 times, average of 1.25 seconds

I would put the test stack up on RevOnline - but I haven't succeeded in 
activating my account there yet - in the meantime, it's at 
http://www.tweedly.net/RunRev/Binary%20Search.rev if you want to look at it.

The binary search actually does better in smaller searches (400 ms versus 
750 ms for an average search in 100,000 offsets) - but not really enough 
benefit to be worth the extra complexity.

Not surprisingly, putting the offset list into an array is much slower.

So - unless one of the experienced Transcipters has some tricks up their 
sleeve, I think you're stuck with a linear search.

The good news is that the linear search isn't too slow, and you can do any 
number of "SoWhat"s in a single scan - so you're looking at maybe 2 seconds 
to extract the relevant thousands of messages from million+ total messages.

However, to do this you'll need to put the logic into a single loop, you 
won't be able to put the linear search into a separate function.

btw - just for comparison, and to make sure I had the logic right, I tried 
it in another popular scripting language I am familiar with;  the linear 
search is a bit faster than the Transcript version - but the binary search 
runs approx 1000 times as fast as the linear one - it does the binary 
search of 1 million items in < 1 millisecond.

Are you all *sure* we don't need numeric-indexed arrays or numeric-indexed 
lists ?

-- Alex.


More information about the use-livecode mailing list