Deleting Data Woefully Slow
Kay C Lan
lan.kc.macmail at gmail.com
Fri Mar 26 00:59:21 EDT 2010
Folks,
This doesn't really apply to the job at hand but more to the discovery that
it can be faster to rebuild data than just delete 1 line. I realised that if
I knew I only had one line to delete then the 'repeat with x = 1' code could
be optimised. After the line was deleted there'd be no reason to continue
the repeat.
So with that in mind I rewrote the code for a single delete case, although
it still deletes 3 lines total, and unsurprisingly, if the line to find is
the very first line the 'repeat with x = 1' case is way faster than using
'repeat for each' to rebuild the data.
So in a 'this should put a smile on Richard's face' frame of mind, I set
about writing a repeating test that moved the line to be found further and
further into the data until 'repeat with x = 1' and a single delete was no
longer faster than rebuilding the data.
Here are my results:
The line to be found is line 2021 of 100000
Single delete = 126 ms
repeat for each = 116 ms
For 100000 lines of data, if the line to be found is in the first 2.03% of
lines a single search and delete is faster than rebuilding the entire data.
The line to be found is line 641 of 10000
Single delete = 12 ms
repeat for each = 11 ms
For 10000 lines of data, if the line to be found is in the first 6.5% of
lines a single search and delete is faster than rebuilding the entire data.
The line to be found is line 201 of 1000
Single delete = 2 ms
repeat for each = 1 ms
For 1000 lines of data, if the line to be found is in the first 21% of lines
a single search and delete is faster than rebuilding the entire data.
Here is the script I used:
on mouseup
put empty into msg
put 0 into tPrefix --CHANGE START POINT HERE
put 9997 into tSuffix --CHANGE TOTAL LINES
repeat until tTotal2 < tTotal1
put empty into tData
repeat tPrefix times
put "another speed test" & cr after tData
end repeat
put "the 1 line I'm looking for" & cr after tData
put "and this line I don't want" & cr after tData
put "and especially this line needs to be deleted" & cr after tData
repeat tSuffix times
put "What will be the result" & cr after tData
end repeat
put word 1 to -1 of tData into tData
put the number of lines of tData into tLineCount
put tData into tData1
--test 1
put the millisec into tStart
repeat with x = 1 to tLineCount
if (line x tData contains "1") then
delete line x to (x+2) of tData1
exit repeat
end if
end repeat
put word 1 to -1 of tData1 into tData1
put the millisec into tEnd
put tEnd - tStart into tTotal1
--test 2
put 3 into tSkip
put empty into tData2
put the millisec into tStart
repeat for each line tLine in tData
switch
case (tSkip < 3)
put tSkip +1 into tSkip
break
case (tLine contains "1")
put 1 into tSkip
break
default
put tLine & cr after tData2
end switch
end repeat
put word 1 to -1 of tData2 into tData2
put the millisec into tEnd
put tEnd - tStart into tTotal2
if (tData1 <> tData2) then
answer "Error"
breakpoint
end if
put "The line to be found is line " & (tPrefix + 1) & " of " &
(tPrefix + tSuffix + 3) & cr after msg
put "Single delete = " & tTotal1 & " ms" & cr after msg
put "repeat for each = " & tTotal2 & " ms" & cr & cr after msg
put tPrefix + 10 into tPrefix --CHANGE STEP SIZE HERE
put tSuffix - 10 into tSuffix --CHANGE STEP SIZE HERE
end repeat
put "For " & (tPrefix + Tsuffix + 3) & " lines of data, if the line to be
found is in the first " & (tPrefix/(tPrefix + Tsuffix + 3)*100) & "% of
lines a single search and delete is faster than rebuilding the entire data."
after msg
end mouseup
More information about the use-livecode
mailing list