Finding duplicates in a list
Eric Chatonet
eric.chatonet at sosmartsoftware.com
Wed Jan 9 06:27:50 EST 2008
Hi Ian,
I just tried a simple repeat for each:
function Dups pList
local tList2,tList3,tTimer,tStart
-----
ShowProgress 0,the number of lines of pList --
put the milliseconds into tStart
put 0 into tTimer
repeat for each line tLine in pList
if tTimer mod 100 = 0 then ShowProgress tTimer --
add 1 to tTimer
if tLine is not in tList2 then put tLine & cr after tList2
else put tLine & cr after tList3
end repeat
ShowProgress 0 --
return the milliseconds - tStart && "ms" & cr & the number of
lines of pList & cr & the number of lines of tList3 & cr & tList3
end Dups
-------------------------------
on ShowProgress pPos,pEnd
set the thumbpos of sb "Progress" to pPos
if pEnd <> empty then set the endvalue of sb "Progress" to pEnd
end ShowProgress
This ran in about 5 seconds on my Vista machine using your list and
returned 686 duplicates among 8708 references.
The problem with such a method is that it is slowing down as the
check progresses because tList2 is growing :-(
I tried to imagine another solution using arrays
Best regards from Paris,
Eric Chatonet.
Le 9 janv. 08 à 06:44, Ian Wood a écrit :
> The problem - trying to find duplicate files in a database (Apple
> Aperture), and have found a checksum column for all the image files.
>
> I've had a go at writing a handler to find the dupes and it does
> OK, but wondered if the bright sparks on the list have any advice
> on speeding it up it...
>
> The handler:
>
> ====================
>
> put the milliseconds into tt
> put ijwAPLIB_getAllChecksums() into tList -- this returns the
> list of checksums, 10k in my sample BD, over 40k in the 'real' DB
> put number of lines of tList into tNumLines
> sort tlist
> put 0 into x
> repeat tNumLines times
> add 1 to x
> if last char of x is 1 then set the cursor to busy -- removing
> this speeds it up by roughly 10%
> put line x of tList into tCheck
> if tCheck is empty then next repeat
> put x + 1 into y
> repeat (tNumLines - x) times
> put line y of tList into tOther
> if tCheck is tOther then
> put x & tab & y & tab & tCheck & return after tRet
> else
> put y into x
> exit repeat
> end if
> add 1 to y
> end repeat
> end repeat
> put the milliseconds - tt & return & "number of files:" &&
> tNumLines & return & return & tRet
>
> ====================
>
> Sample results:
>
> 9804
> number of files: 8708
>
> 116 117 027351c1bed597af774536af8e982363
> 119 120 0292d175c04d790f50246a5ee043a599
> 162 163 03d6313ee21a91ed0b0343f339c583e4
> 185 186 046ddab379a8f44955f1d5605c294605
> 230 231 05a77db5e76eb02f8d439e13286d3620
> 245 246 065474aa9bba7e2f24c7435863f5f2ff
> 314 315 0884f4b24b5bd99ddefdb100fde58a31
> 333 334 0918ce2135933d6c8f0ee2860837b5f9
> 360 361 0a2525bef1a46a329b7e902981ef94e2
> 360 362 0a2525bef1a46a329b7e902981ef94e2
> 360 363 0a2525bef1a46a329b7e902981ef94e2
> 360 364 0a2525bef1a46a329b7e902981ef94e2
>
> Ian
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: eric.chatonet at sosmartsoftware.com/
----------------------------------------------------------------
More information about the use-livecode
mailing list