Finding duplicates in a list
Ian Wood
revlist at azurevision.co.uk
Wed Jan 9 00:44:41 EST 2008
The problem - trying to find duplicate files in a database (Apple
Aperture), and have found a checksum column for all the image files.
I've had a go at writing a handler to find the dupes and it does OK,
but wondered if the bright sparks on the list have any advice on
speeding it up it...
The handler:
====================
put the milliseconds into tt
put ijwAPLIB_getAllChecksums() into tList -- this returns the list
of checksums, 10k in my sample BD, over 40k in the 'real' DB
put number of lines of tList into tNumLines
sort tlist
put 0 into x
repeat tNumLines times
add 1 to x
if last char of x is 1 then set the cursor to busy -- removing
this speeds it up by roughly 10%
put line x of tList into tCheck
if tCheck is empty then next repeat
put x + 1 into y
repeat (tNumLines - x) times
put line y of tList into tOther
if tCheck is tOther then
put x & tab & y & tab & tCheck & return after tRet
else
put y into x
exit repeat
end if
add 1 to y
end repeat
end repeat
put the milliseconds - tt & return & "number of files:" &&
tNumLines & return & return & tRet
====================
Sample results:
9804
number of files: 8708
116 117 027351c1bed597af774536af8e982363
119 120 0292d175c04d790f50246a5ee043a599
162 163 03d6313ee21a91ed0b0343f339c583e4
185 186 046ddab379a8f44955f1d5605c294605
230 231 05a77db5e76eb02f8d439e13286d3620
245 246 065474aa9bba7e2f24c7435863f5f2ff
314 315 0884f4b24b5bd99ddefdb100fde58a31
333 334 0918ce2135933d6c8f0ee2860837b5f9
360 361 0a2525bef1a46a329b7e902981ef94e2
360 362 0a2525bef1a46a329b7e902981ef94e2
360 363 0a2525bef1a46a329b7e902981ef94e2
360 364 0a2525bef1a46a329b7e902981ef94e2
Ian
More information about the use-livecode
mailing list