SC, Rev,and RB speed test
Brian Yennie
briany at qldlearning.com
Sat Apr 17 04:58:06 EDT 2004
David,
Very clever indeed!
It's amazing how many different optimizations we can find based on the
type of data. There's the baseline brute force, there's mine which
takes a constant factor out (by shrinking the text), but would fail if
you had many different words throughout with the same suffixes (as odd
as that may seem). There were a couple of methods that fly through
redundant text. And there's your latest which really whoops it up on
large texts with certain length search strings.
I wonder on yours, could there be some advantage to be gleaned from the
fact that longer searches can be broken down into shorter sub-searches
(i.e. 6 words = 3 words + 3 words)? Could be interesting if there's a
slick way to utilize that.
From what I can tell, the algorithms roughly look like:
Brute force: ~500 ticks Dual-G4 baseline, runs faster when there are
more matches
Redundancy array methods: As little as a few ticks, but slower on
pseudo-random data
Suffix indexing method: about 60-80% faster at eliminating matches and
20-40% slower at finding matches than brute force
3-word indexing method: About 10% overhead with 99% speedup on 3 word
queries, same speed on others
> I took Chris data and examined it. My solution below exploits that
> knowledge of the data but if it is genuinely a typical search set
> (most search material being three words or so) then the following does
> the trick.
>
> Using the algorithm on Chris' web page I process the material in 530
> ticks on my machine (Chris does it in 494). With the code below this
> is cut to 51 ticks so it should run in <50 ticks on Chris' machine, an
> all-time record.
>
> The trick is to exploit "intersect" (look it up) which is amazingly
> fast at, er, intersecting. The intersection itself executes in around
> 50 milliSeconds or about 3 ticks. It is all the preparatory work which
> takes the other 48 ticks.
More information about the use-livecode
mailing list