Binary search : was Re: shilling for my feature request [1926]

Alex Tweedly alex at
Mon Aug 23 11:02:48 EDT 2004

At 09:44 23/08/2004 +0100, Alex Tweedly wrote:
>Now that I'm interested (fatal for me), I'm going to try it myself - so 
>look for a post from me later today with a simple binarysearch function, 
>and some timing results.

Mark, there's good news and bad news. :-)

The good news is that Transcript again amazes me with its great performance 
at chunk expression manipulation; it can iterate through the lines (or 
items) of a variable VERY quickly.

The bad news is that it is (relatively) slow at accessing individual lines 
(or items).

So the benefits of a binary search are counterbalanced by the slow access 
to the individual lines.

Note - for this purpose, you NEED to have your offsets stored as lines, not 
items, because if you try to put 1 million items into a variable you'll run 
into the limit on the number of characters in a line.
(took me a while to spot that one - couldn't figure out why I was creating 
1 million items but there were only about 9500 of them when I was done !)

Here's the test I did :
  - each offset is a value between 1 and 100 million.
  - create a list of 1 million offsets (random integers in that range)
              (this took 9 seconds)
  - sort that list
               (took 40 seconds !!)
  - linear search for the offset immediately below a random offset
              10 times, average of 1.3 seconds
  - binary search for same
               10 times, average of 1.25 seconds

I would put the test stack up on RevOnline - but I haven't succeeded in 
activating my account there yet - in the meantime, it's at if you want to look at it.

The binary search actually does better in smaller searches (400 ms versus 
750 ms for an average search in 100,000 offsets) - but not really enough 
benefit to be worth the extra complexity.

Not surprisingly, putting the offset list into an array is much slower.

So - unless one of the experienced Transcipters has some tricks up their 
sleeve, I think you're stuck with a linear search.

The good news is that the linear search isn't too slow, and you can do any 
number of "SoWhat"s in a single scan - so you're looking at maybe 2 seconds 
to extract the relevant thousands of messages from million+ total messages.

However, to do this you'll need to put the logic into a single loop, you 
won't be able to put the linear search into a separate function.

btw - just for comparison, and to make sure I had the logic right, I tried 
it in another popular scripting language I am familiar with;  the linear 
search is a bit faster than the Transcript version - but the binary search 
runs approx 1000 times as fast as the linear one - it does the binary 
search of 1 million items in < 1 millisecond.

Are you all *sure* we don't need numeric-indexed arrays or numeric-indexed 
lists ?

-- Alex.

More information about the Use-livecode mailing list