SC, Rev,and RB speed test

Brian Yennie briany at qldlearning.com
Sat Apr 17 04:58:06 EDT 2004


David,

Very clever indeed!

It's amazing how many different optimizations we can find based on the 
type of data. There's the baseline brute force, there's mine which 
takes a constant factor out (by shrinking the text), but would fail if 
you had many different words throughout with the same suffixes (as odd 
as that may seem). There were a couple of methods that fly through 
redundant text. And there's your latest which really whoops it up on 
large texts with certain length search strings.

I wonder on yours, could there be some advantage to be gleaned from the 
fact that longer searches can be broken down into shorter sub-searches 
(i.e. 6 words = 3 words + 3 words)? Could be interesting if there's a 
slick way to utilize that.

 From what I can tell, the algorithms roughly look like:

Brute force: ~500 ticks Dual-G4 baseline, runs faster when there are 
more matches
Redundancy array methods: As little as a few ticks, but slower on 
pseudo-random data
Suffix indexing method: about 60-80% faster at eliminating matches and 
20-40% slower at finding matches than brute force
3-word indexing method: About 10% overhead with 99% speedup on 3 word 
queries, same speed on others

> I took Chris data and examined it. My solution below exploits that 
> knowledge of the data but if it is genuinely a typical search set 
> (most search material being three words or so) then the following does 
> the trick.
>
> Using the algorithm on Chris' web page I process the material in 530 
> ticks on my machine (Chris does it in 494). With the code below this 
> is cut to 51 ticks so it should run in <50 ticks on Chris' machine, an 
> all-time record.
>
> The trick is to exploit "intersect" (look it up) which is amazingly 
> fast at, er, intersecting. The intersection itself executes in around 
> 50 milliSeconds or about 3 ticks. It is all the preparatory work which 
> takes the other 48 ticks.



More information about the use-livecode mailing list