Trying to code the fastest algorithm...

jbv jbv.silences at Club-Internet.fr
Sun Jun 5 15:47:01 EDT 2005



Alex,

> >    repeat for each item i in it
> >        get itemoffset(i,myReference)
> >        if it>0 then
> >            put it & itemdelimiter after b
> >        end if
> >    end repeat
> >
> >
> >
> Shouldn't that be "repeat for each item i in a" rather than "... in it",
> since "it" gets over-written within the loop ?

yes, you're right. That was a typo.
Actually my script is slightly more complex, and I had to strip
off a couple of things to make the problem easier to explain...
hence the typo...

So here's a clean version of my code :

put "" into newList
repeat for each line j in mySentences
    put j into a
    replace " " with itemdelimiter in a
    put "" into b

    repeat for each item i in a
        get itemoffset(i,myReference)
        if it>0 then
            put it & itemdelimiter after b
        end if
    end repeat
    delete last char in b

    if b contains itemdelimiter then
        put b & cr after newList
    end if
 end repeat

>
> I don't see anything here which "keeps only those sentences containing
> more than 1 word" ?

it's the test :
    if b contains itemdelimiter then
which runs about 25% to 30% faster than
    if number of items of b > 1 then

>
>
> Do you have a "typical" dataset in mind ?
> or know any characteristics of it ?

as I said, only sentences, in french or english, with a maximum
of 30 words.

>
> how many words in the reference list ?

can range between 1 and 25.

>
> how many sentences, and how long is a typical one ?

can range from 1 to several hundreds (hence the need for speed).

>
> how many "hits" vs "misses" ?

unpredictable...

>
>
> Since you're asking for "fastest" algorithm, I assume at least one of
> these numbers is large - and it's unlikely the same algorithm would
> excel at both ends of the spectrum .....
>
> I have 2 or 3 approaches in mind ... any clues you have on the
> characteristics of the data would help decide which ones to pursue .....
> e.g.   build an assoc array of the words which occur in myReferences,
> where the content of the array element is the word number within the
> references, and lookup each word in the sentence with that ... should be
> faster than "itemOffset" but probably not enough to justify it if the
> number of words in myReferences is very small.
>

The "array" approach crossed my mind, but I'm afraid it's not feasable :
the set of reference words is unpredictable. It is actually derived from a
sentence entered by the end user, and after some processing / selection
a set words is extracted from that sentence and then compared with other
sentences in a data base...

Thanks for your time & collaboration anyway.
JB



More information about the use-livecode mailing list