Need fore speed...
R.Beynon at liverpool.ac.uk
Sat Jul 14 02:32:13 CDT 2007
I don't want to be accused of passing effort out to the list, but I would like to know, from the experts, the fastest way to execute some code..
I have a file of tab delimited items like so...16,000 lines long read into a single variable. (The right bracket is part of the file, not a quote mark)
>DAPA_ECO57 MFTGSIVAIVTPMDEKGNVCR 2267.1049 1
>DAPA_ECOL6 MFTGSIVAIVTPMDEKGNVCR 2267.1049 1
>DAPA_ECOLI MFTGSIVAIVTPMDEKGNVCR 2267.1049 1
>(-M)YEGS_ECOLI AEFPASLLILNGKSTDNLPLR 2268.2398 1
>GLTS_ECOLI MFHLDTLATLVAATLTLLLGR 2269.28372 0
>(-M)YDIV_ECOLI KIFLENLYHSDCYFLPIR 2270.1462 1
>YBAA_ECO57 MKYVDGFVVAVPADKKDAYR 2271.16256 3
>YBAA_ECOL6 MKYVDGFVVAVPADKKDAYR 2271.16256 3
>YBAA_ECOLI MKYVDGFVVAVPADKKDAYR 2271.16256 3
I now want to search each line, and match an input number (obsMass) against the third item of each line (pepMass)
If it matches, within a certain tolerance (expressed in parts per million (ppm) - the tolerance changes with the magnitude of the number) I want to do something with that line. Ties (or multiple hits to the same obsMass) are OK.
I then want to repeat that process for several thousand different values of obsMass. In other words, I will evaluate the mass matching loop tens of millions of times.
Some questions - is there advantage in sorting the list by mass first?
Would I gain speed by using integer arithmetic (is that even possible?) or by matching the numbers as strings?
And most importantly..
Is there an elegant rev way of handling this that I don't even know about?!
Prof R J Beynon[h]
Proteomics and Functional Genomics Group
Faculty of Veterinary Science
University of Liverpool
Crown Street, Liverpool L69 7ZJ
Phone: +44 151 794 4312
Fax: +44 151 794 4243
Email: r.beynon at liv.ac.uk
More information about the use-livecode