Counting and numbering duplicates in a list

Peter M. Brigham, MD pmbrig at gmail.com
Wed Sep 28 23:55:08 EDT 2011


Just timed the script below using a 10000 (10^4) line list -- 1.964 seconds. Not great if you're dealing with >= 10^5 items. Can someone do better?

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig

> function flagDupes tList
>   put tList into scratchList
>   repeat for each line t in tList
>      if t is among the lines of scratchList or \
>             freqArray[t] > 0 then
>         add 1 to freqArray[t]
>         put cr & t & "-" & freqArray[t] after outputList
>      else
>         put cr & t after outputList
>      end if
>      delete line 1 of scratchList
>   end repeat
>   delete char 1 of outputList
>   return outputList
> end flagDupes
> 
> I *think* this should be fast with large lists.

On Sep 28, 2011, at 10:52 PM, Roger Eller wrote:

> There are several ways I could approach this, but I'm unsure which way is
> best?  I have a list of numbers that 'may' contain duplicates.  I need to
> sequence ONLY the duplicates without changing the order the list. If there
> is only one, it does not need to be sequenced.
> 
> Should I just repeat, and keep the content of line x in a variable, then add
> 1 to a sequence variable if the number is encountered again?  Is there a
> better way?  Simple stuff, I know, but these lists can be really long, and I
> want it to process as quickly possible.
> 
> 12345
> 12345
> 12344
> 12333
> 10112
> 12333
> 
> must become:
> 
> 12345-1
> 12345-2
> 12344
> 12333-1
> 10112
> 12333-2
> 
> ˜Roger




More information about the use-livecode mailing list