SEX contributions anyone

Ben Rubinstein benr_mc at cogapp.com
Thu Sep 4 13:06:00 EDT 2003


on 4/9/03 5:17 pm, MisterX wrote

> Has anyone got a "Fast" remove duplicate lines script?
> Best I get is 12ms per line... Any line being a word.

If split worked the way I think it should (and still could, quite compatibly
- perhaps I'll make in bugzilla a suggestion I made long ago) then doing
split/combine would probably do this almost instantaneously.

Even without that, I've found Rev/MC's hashed arrays fantastically
efficient.  Have you tried simply:

    put empty into aTemp
    repeat for each line t in tManyLines
       put true into aTemp[t]
    end repeat
    put the keys of aTemp into tFewerLines

Of course that will lose the order, but I'd expect it to be very fast.  If
you want to keep sequence (first appearance) then

    put empty into tFewerLines
    put empty into aTemp
    repeat for each line t in tManyLines
       if aTemp[t] = empty then
           put t & return after tFewerLines
           put true into aTemp[t]
       end if
    end repeat
 
should work, albeit a bit more slowly.
 
  Ben Rubinstein               |  Email: benr_mc at cogapp.com
  Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
  http://www.cogapp.com        |  Fax  : +44 (0)1273-728866




More information about the metacard mailing list