Trying to code the fastest algorithm...
jbv
jbv.silences at Club-Internet.fr
Sun Jun 5 15:47:01 EDT 2005
Alex,
> > repeat for each item i in it
> > get itemoffset(i,myReference)
> > if it>0 then
> > put it & itemdelimiter after b
> > end if
> > end repeat
> >
> >
> >
> Shouldn't that be "repeat for each item i in a" rather than "... in it",
> since "it" gets over-written within the loop ?
yes, you're right. That was a typo.
Actually my script is slightly more complex, and I had to strip
off a couple of things to make the problem easier to explain...
hence the typo...
So here's a clean version of my code :
put "" into newList
repeat for each line j in mySentences
put j into a
replace " " with itemdelimiter in a
put "" into b
repeat for each item i in a
get itemoffset(i,myReference)
if it>0 then
put it & itemdelimiter after b
end if
end repeat
delete last char in b
if b contains itemdelimiter then
put b & cr after newList
end if
end repeat
>
> I don't see anything here which "keeps only those sentences containing
> more than 1 word" ?
it's the test :
if b contains itemdelimiter then
which runs about 25% to 30% faster than
if number of items of b > 1 then
>
>
> Do you have a "typical" dataset in mind ?
> or know any characteristics of it ?
as I said, only sentences, in french or english, with a maximum
of 30 words.
>
> how many words in the reference list ?
can range between 1 and 25.
>
> how many sentences, and how long is a typical one ?
can range from 1 to several hundreds (hence the need for speed).
>
> how many "hits" vs "misses" ?
unpredictable...
>
>
> Since you're asking for "fastest" algorithm, I assume at least one of
> these numbers is large - and it's unlikely the same algorithm would
> excel at both ends of the spectrum .....
>
> I have 2 or 3 approaches in mind ... any clues you have on the
> characteristics of the data would help decide which ones to pursue .....
> e.g. build an assoc array of the words which occur in myReferences,
> where the content of the array element is the word number within the
> references, and lookup each word in the sentence with that ... should be
> faster than "itemOffset" but probably not enough to justify it if the
> number of words in myReferences is very small.
>
The "array" approach crossed my mind, but I'm afraid it's not feasable :
the set of reference words is unpredictable. It is actually derived from a
sentence entered by the end user, and after some processing / selection
a set words is extracted from that sentence and then compared with other
sentences in a data base...
Thanks for your time & collaboration anyway.
JB
More information about the use-livecode
mailing list