Binary search : was Re: shilling for my feature request 
alex at tweedly.net
Mon Aug 23 10:02:48 CDT 2004
At 09:44 23/08/2004 +0100, Alex Tweedly wrote:
>Now that I'm interested (fatal for me), I'm going to try it myself - so
>look for a post from me later today with a simple binarysearch function,
>and some timing results.
Mark, there's good news and bad news. :-)
The good news is that Transcript again amazes me with its great performance
at chunk expression manipulation; it can iterate through the lines (or
items) of a variable VERY quickly.
The bad news is that it is (relatively) slow at accessing individual lines
So the benefits of a binary search are counterbalanced by the slow access
to the individual lines.
Note - for this purpose, you NEED to have your offsets stored as lines, not
items, because if you try to put 1 million items into a variable you'll run
into the limit on the number of characters in a line.
(took me a while to spot that one - couldn't figure out why I was creating
1 million items but there were only about 9500 of them when I was done !)
Here's the test I did :
- each offset is a value between 1 and 100 million.
- create a list of 1 million offsets (random integers in that range)
(this took 9 seconds)
- sort that list
(took 40 seconds !!)
- linear search for the offset immediately below a random offset
10 times, average of 1.3 seconds
- binary search for same
10 times, average of 1.25 seconds
I would put the test stack up on RevOnline - but I haven't succeeded in
activating my account there yet - in the meantime, it's at
http://www.tweedly.net/RunRev/Binary%20Search.rev if you want to look at it.
The binary search actually does better in smaller searches (400 ms versus
750 ms for an average search in 100,000 offsets) - but not really enough
benefit to be worth the extra complexity.
Not surprisingly, putting the offset list into an array is much slower.
So - unless one of the experienced Transcipters has some tricks up their
sleeve, I think you're stuck with a linear search.
The good news is that the linear search isn't too slow, and you can do any
number of "SoWhat"s in a single scan - so you're looking at maybe 2 seconds
to extract the relevant thousands of messages from million+ total messages.
However, to do this you'll need to put the logic into a single loop, you
won't be able to put the linear search into a separate function.
btw - just for comparison, and to make sure I had the logic right, I tried
it in another popular scripting language I am familiar with; the linear
search is a bit faster than the Transcript version - but the binary search
runs approx 1000 times as fast as the linear one - it does the binary
search of 1 million items in < 1 millisecond.
Are you all *sure* we don't need numeric-indexed arrays or numeric-indexed
More information about the use-livecode