There should be a "unique" option on sort . . .

Geoff Canyon gcanyon at gmail.com
Tue Jan 7 00:51:48 EST 2014


I *think* I tested up to 100,000 lines -- spent the day traveling back from
Boston to St. Louis, a little groggy -- *but* my keys were always integers.

On Mon, Jan 6, 2014 at 10:13 AM, Richard Gaskin
<ambassador at fourthworld.com>wrote:

> function UniqueListFromChunks pData
>    sort lines of pData
>    # put line 1 of pData is false into tLastLine -- what does this do?
>    put empty into tLastLine
>    repeat for each line tLine in pData
>       if tLine is tLastLine then next repeat
>

NOTE: I corrected the variable names in the above.

Ha ;-) on any day other than yesterday would have done what you did, or
skipped the initialization altogether. BUT -- if the first line of data in
pData is empty (your version) or tLastLine (mine), the above will
(incorrectly) omit it. So the line I gave, "put line 1 of pData is false
into lastLine" guarantees that the first line will be included, without
having to use a conditional that only matters the first iteration inside
the repeat.

Huh, that's weird. I copy/pasted your code into a button and ran it:

10 iterations on 100 lines of 5 or fewer chars:

Array: 2 ms (0.2 ms per iteration)

Chunks: 1 ms (0.1 ms per iteration)

Results match - Each list has 95 lines


I ran that several times, and the winner flip-flopped several times. So I
switched to the long seconds. With that, the "Chunks" version is almost
always the winner (if only by a few ten-thousandths of a second) Typical
result:


10 iterations on 100 lines of 5 or fewer chars:

Array: 0.001466 seconds (0.000147 seconds per iteration)

Chunks: 0.001218 seconds (0.000122 seconds per iteration)

Results match - Each list has 97 lines


Curiouser and curiouser:


10 iterations on 100 lines of 250 or fewer chars:

Array: 0.002393 seconds (0.000239 seconds per iteration)

Chunks: 0.001738 seconds (0.000174 seconds per iteration)

Results match - Each list has 97 lines



With 1000 lines it favors the array more often, but I still saw outcomes
where chunks won (not this result, obviously -- trying to be
representative):


10 iterations on 1000 lines of 5 or fewer chars:

Array: 0.007609 seconds (0.000761 seconds per iteration)

Chunks: 0.007894 seconds (0.000789 seconds per iteration)

Results match - Each list has 617 lines


And then back to chunks (mostly):


10 iterations on 1000 lines of 250 or fewer chars:

Array: 0.015478 seconds (0.001548 seconds per iteration)

Chunks: 0.015227 seconds (0.001523 seconds per iteration)

Results match - Each list has 740 lines


We start converging at 10,000 and beyond:


10 iterations on 10000 lines of 5 or fewer chars:

Array: 0.029378 seconds (0.002938 seconds per iteration)

Chunks: 0.06806 seconds (0.006806 seconds per iteration)

Results match - Each list has 988 lines


10 iterations on 10000 lines of 250 or fewer chars:

Array: 0.071169 seconds (0.007117 seconds per iteration)

Chunks: 0.148104 seconds (0.01481 seconds per iteration)

Results match - Each list has 1492 lines


10 iterations on 100000 lines of 5 or fewer chars:

Array: 0.229239 seconds (0.022924 seconds per iteration)

Chunks: 0.732289 seconds (0.073229 seconds per iteration)

Results match - Each list has 985 lines


10 iterations on 100000 lines of 250 or fewer chars:

Array: 0.604814 seconds (0.060481 seconds per iteration)

Chunks: 2.04249 seconds (0.204249 seconds per iteration)

Results match - Each list has 1494 lines



More information about the use-livecode mailing list