deleteDups() -- split is faster

Geoff Canyon gcanyon at inspiredlogic.com
Wed Nov 16 15:17:40 CST 2005

```'doh! -- I made a mistake -- see below:

On Nov 16, 2005, at 10:03 AM, HyperChris at aol.com wrote:

> Thanks Geoff. Actually, I am getting split as four times faster
> than 'for each' (under v2.6.0 & 2.6.1)

On Nov 16, 2005, at 10:51 AM, Mark Smith wrote:
> I'm getting the split version as about 10% faster than 'repeat for
> each'

If you're using Eric's version, this line is hurting your performance:

if tLine is not among the lines of tStrippedList

This line is faster, and doesn't degrade as quickly with larger lists:

put 1 into y[L]

That said, I re-did the test with the actual functions:

function deleteDups pList -- does _not_ retain input order
split pList by cr and numToChar(3)
return keys(pList)
end deleteDups

function deleteDupes pList -- does _not_ retain input order
repeat for each line L in pList
put 1 into x[L]
end repeat
return the keys of x
end deleteDupes

And found that variable initialization is a big factor, and isn't
taken into account in a simple "do this 100 times" repeat loop. The
end result is that putting both the above functions to the test, I
get much the same result as Mark Smith: the split command was about
20% faster for me: 9 ticks vs. 11 ticks.

If you want to retain the input order, this took about 14 ticks:

function deDupe pList -- retains input order
repeat for each line L in pList
if x[L] is empty then put L & cr after tReturn
put 1 into x[L]
end repeat
return char 1 to -2 of tReturn
end deDupe

If you are certain that the duplicates are sequential, this takes
only 5 ticks:

function deDupe2 pList -- retains input order, assumes dupes are
sequential
put empty into tLast
repeat for each line L in pList
if L is tLast then next repeat
put L & cr after tReturn
put L into tLast
end repeat
return char 1 to -2 of tReturn
end deDupe2

regards,

Geoff

```