Need for Speed (postscript)

Dick Kriesel dick.kriesel at mail.com
Mon Jul 16 22:17:53 EDT 2007


On 7/15/07 12:58 PM, "Beynon, Rob" <R.Beynon at liverpool.ac.uk> wrote:

> I ran a brutal test, of 45,000 lines in massList and 16,000 lines in seqDB
> 
> My crude attempt seems to be capable, even running within the Rev IDE, of
> completing the 720million comparisons in about 30minutes (OK, admittedly
> CoreDuo 2.66GHz, 2GB RAM). That's 24million a minute! (I deliberately put some
> searches that would match at the end of seqDB, to be sure I searched through
> most of the file each time). I am pretty happy with this, and I'd be looking
> for at least a 10-fold gain in speed to code up a harder solution.
> 
>  
> 
> Do you experts thing a 10-fold gain is feasible? 100-fold?

I'd be surprised to get that big a gain.  Nonetheless, here's a version that
emphasizes arrays over lists.  In the following code, the reasons why this
version may run faster appear as comments.  If you'll try it and report the
results, we'll learn whether the extra programming is justified.

on mouseUp
  put the milliseconds into tMilliseconds1
  put gSeqDB into tPeptideArray
  split tPeptideArray using cr
  repeat for each key tPeptideKey in tPeptideArray
    put word 3 of tPeptideArray[tPeptideKey] \
        into tPeptideMassArray[tPeptideKey]
    -- note: get word 3 once for each peptide
    --          rather than once for each combination of peptide and mass
  end repeat
  put field "ppm" into tPPM
  put tPPM / 1000000 into tFactor
  -- note: divide by a million once for each mouseUp
  --          rather than once for each mass
  put field "massList" into tMassArray
  split tMassArray using cr
  put the milliseconds into tMilliseconds2
  put tMilliseconds2 - tMilliseconds1 && "milliseconds for setup" & cr
  
  repeat for each key tMassKey in tMassArray
    put tMassArray[tMassKey] into tMass
    put tMass * tFactor into tMassThreshold
    repeat for each key tPeptideKey in tPeptideMassArray
      if abs(tMass - tPeptideMassArray[tPeptideKey]) <= tMassThreshold then
        put empty into tOutputArray[tMassKey,tPeptideKey]
      end if
      -- note: minimize the work of the innermost loop
    end repeat
  end repeat
  put the milliseconds into tMilliseconds3
  put tMilliseconds3 - tMilliseconds2 && "milliseconds for filtering" \
      & cr after msg
  
  put the keys of tOutputArray into tKeys
  sort tKeys numeric using item 2 of each
  sort tKeys numeric using item 1 of each
  repeat for each line tKey in tKeys
    -- note: ignore any mass for which every peptide failed the test
    put item 1 of tKey into tMassKey
    if tMassKey <> tMassKeyPrev then
      add 1 to i
      put "NEW SEARCH, MASS = " & tMassArray[tMassKey] \
          & " at " & tPPM & " ppm error" into tOutputData[i]
      put tMassKey into tMassKeyPrev
    end if
    add 1 to i
    put item 2 of tKey into tPeptideKey
    put tPeptideArray[tPeptideKey] & "K" & tab \
        & tPeptideMassArray[tPeptideKey] - tMass into tOutputData[i]
    -- note: don't bother with the line of equal signs
  end repeat
  combine tOutputData using cr
  put tOutputData into field "Output"
  put the milliseconds into tMilliseconds4
  put tMilliseconds4 - tMilliseconds3 && "milliseconds for output" \
      & cr after msg
  
  put tMilliseconds4 - tMilliseconds1 && "milliseconds total" \
      & cr after msg
end mouseUp

If I've introduced bugs you'd like me to squash, please let me know.

The fact that you have a field "ppm" rather than a constant in the program
causes me to wonder whether you change the value and run the program again.
If so, you could use another nested repeat instead.

-- Dick





More information about the use-livecode mailing list