Finding duplicates in a list
Bill Marriott
wjm at wjm.org
Wed Jan 9 06:33:34 EST 2008
on mouseUp
put url "file:D:/desktop/dupeslist.txt" into tList
set the itemdelimiter to tab
put the milliseconds into tt
put 0 into n
repeat for each line tCheck in tList
add 1 to n
put n & tab & tCheck & return after tCheckArray[tCheck]
end repeat
put empty into tListResult
repeat for each key theKey in tCheckArray
if the number of lines in tCheckArray[theKey] > 1 then
repeat with i = 2 to the number of lines in tCheckArray[theKey]
put item 1 of line 1 of tCheckArray[theKey] & tab & \
item 1 of line i of tCheckArray[theKey] & tab & \
theKey & return after tListResult
end repeat
-- put tCheckArray[j] & return after tListResult
end if
end repeat
put the milliseconds - tt & return & "number of files:" && \
the number of lines in tList & return & return & tListResult
end mouseUp
64 milliseconds on my computer, versus 5023 for yours.
"Ian Wood" <revlist at azurevision.co.uk> wrote
in message news:5461718F-FCCB-45BA-804C-4D4F4DB3A089 at azurevision.co.uk...
> Hi Bill,
>
> <http://azurevision.co.uk/rev/dupeslist.txt>
>
> Yes, I wish to end up with a list of duplicates, not a list of uniques.
>
> Ian
>
> On 9 Jan 2008, at 08:57, Bill Marriott wrote:
>
>> Hi Ian,
>>
>> I have a couple ideas... Would you be able to upload a sample list
>> somewhere -- the result of ijwAPLIB_getAllChecksums() -- so we could try
>> it
>> out and measure? Also, just want to double-check that you intend to
>> produce
>> a list of duplicate items, not a list of uniques.
>>
>> - Bill
>>
>> "Ian Wood" <revlist at azurevision.co.uk>
>> wrote
>> in message news:3E1F0603-98FC-4F54-8259-
>> E56C048FF3C9 at azurevision.co.uk...
>>> The problem - trying to find duplicate files in a database (Apple
>>> Aperture), and have found a checksum column for all the image files.
>>>
>>> I've had a go at writing a handler to find the dupes and it does OK,
>>> but
>>> wondered if the bright sparks on the list have any advice on speeding
>>> it
>>> up it...
>>>
>>> [snip]
More information about the use-livecode
mailing list