Need for Speed (postscript)
Dick Kriesel
dick.kriesel at mail.com
Mon Jul 16 22:17:53 EDT 2007
On 7/15/07 12:58 PM, "Beynon, Rob" <R.Beynon at liverpool.ac.uk> wrote:
> I ran a brutal test, of 45,000 lines in massList and 16,000 lines in seqDB
>
> My crude attempt seems to be capable, even running within the Rev IDE, of
> completing the 720million comparisons in about 30minutes (OK, admittedly
> CoreDuo 2.66GHz, 2GB RAM). That's 24million a minute! (I deliberately put some
> searches that would match at the end of seqDB, to be sure I searched through
> most of the file each time). I am pretty happy with this, and I'd be looking
> for at least a 10-fold gain in speed to code up a harder solution.
>
>
>
> Do you experts thing a 10-fold gain is feasible? 100-fold?
I'd be surprised to get that big a gain. Nonetheless, here's a version that
emphasizes arrays over lists. In the following code, the reasons why this
version may run faster appear as comments. If you'll try it and report the
results, we'll learn whether the extra programming is justified.
on mouseUp
put the milliseconds into tMilliseconds1
put gSeqDB into tPeptideArray
split tPeptideArray using cr
repeat for each key tPeptideKey in tPeptideArray
put word 3 of tPeptideArray[tPeptideKey] \
into tPeptideMassArray[tPeptideKey]
-- note: get word 3 once for each peptide
-- rather than once for each combination of peptide and mass
end repeat
put field "ppm" into tPPM
put tPPM / 1000000 into tFactor
-- note: divide by a million once for each mouseUp
-- rather than once for each mass
put field "massList" into tMassArray
split tMassArray using cr
put the milliseconds into tMilliseconds2
put tMilliseconds2 - tMilliseconds1 && "milliseconds for setup" & cr
repeat for each key tMassKey in tMassArray
put tMassArray[tMassKey] into tMass
put tMass * tFactor into tMassThreshold
repeat for each key tPeptideKey in tPeptideMassArray
if abs(tMass - tPeptideMassArray[tPeptideKey]) <= tMassThreshold then
put empty into tOutputArray[tMassKey,tPeptideKey]
end if
-- note: minimize the work of the innermost loop
end repeat
end repeat
put the milliseconds into tMilliseconds3
put tMilliseconds3 - tMilliseconds2 && "milliseconds for filtering" \
& cr after msg
put the keys of tOutputArray into tKeys
sort tKeys numeric using item 2 of each
sort tKeys numeric using item 1 of each
repeat for each line tKey in tKeys
-- note: ignore any mass for which every peptide failed the test
put item 1 of tKey into tMassKey
if tMassKey <> tMassKeyPrev then
add 1 to i
put "NEW SEARCH, MASS = " & tMassArray[tMassKey] \
& " at " & tPPM & " ppm error" into tOutputData[i]
put tMassKey into tMassKeyPrev
end if
add 1 to i
put item 2 of tKey into tPeptideKey
put tPeptideArray[tPeptideKey] & "K" & tab \
& tPeptideMassArray[tPeptideKey] - tMass into tOutputData[i]
-- note: don't bother with the line of equal signs
end repeat
combine tOutputData using cr
put tOutputData into field "Output"
put the milliseconds into tMilliseconds4
put tMilliseconds4 - tMilliseconds3 && "milliseconds for output" \
& cr after msg
put tMilliseconds4 - tMilliseconds1 && "milliseconds total" \
& cr after msg
end mouseUp
If I've introduced bugs you'd like me to squash, please let me know.
The fact that you have a field "ppm" rather than a constant in the program
causes me to wonder whether you change the value and run the program again.
If so, you could use another nested repeat instead.
-- Dick
More information about the use-livecode
mailing list