# Trying to code the fastest algorithm...

Alex Tweedly alex at tweedly.net
Sun Jun 5 16:06:23 CDT 2005

```Alex Tweedly wrote:

>
>>> Since you're asking for "fastest" algorithm, I assume at least one of
>>> these numbers is large - and it's unlikely the same algorithm would
>>> excel at both ends of the spectrum .....
>>>
>>> I have 2 or 3 approaches in mind ... any clues you have on the
>>> characteristics of the data would help decide which ones to pursue
>>> .....
>>> e.g.   build an assoc array of the words which occur in myReferences,
>>> where the content of the array element is the word number within the
>>> references, and lookup each word in the sentence with that ...
>>> should be
>>> faster than "itemOffset" but probably not enough to justify it if the
>>> number of words in myReferences is very small.
>>>
>>>
>>
>>
>> The "array" approach crossed my mind, but I'm afraid it's not feasable :
>> the set of reference words is unpredictable. It is actually derived
>> from a
>> sentence entered by the end user, and after some processing / selection
>> a set words is extracted from that sentence and then compared with other
>> sentences in a data base...
>>
>>
>>
> Still feasible ....( typing straight into email - typos possible )
>
> put empty into myArray
> put 0 into count
> repeat for each item W in myReference
>    put W && count & cr after myArray
> end repeat
> split myArray by CR and space
>
> I'll try some simple experiments based on the figures you gave....
> more later.

I tried :
20 words in the reference list
20 - 30 words in each sentence
1000 sentences  (actually, two sentences, repeated 500 times :-)
500 matches (each match is of 6 words from the ref list)

your original algorithm took 305 millisecs (once I remembered to set
wholeMatches to true !!)
using the array split method as above took 28 millisecs  (including the
time to set up and split the array)

Hope that's fast enough - but if not, let me know and I'll try another
approach ....

--
Alex Tweedly       http://www.tweedly.net

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.6.2 - Release Date: 04/06/2005

```