Deleting Data Woefully Slow

Kay C Lan lan.kc.macmail at gmail.com
Fri Mar 26 00:59:21 EDT 2010


Folks,

This doesn't really apply to the job at hand but more to the discovery that
it can be faster to rebuild data than just delete 1 line. I realised that if
I knew I only had one line to delete then the 'repeat with x = 1' code could
be optimised. After the line was deleted there'd be no reason to continue
the repeat.

So with that in mind I rewrote the code for a single delete case, although
it still deletes 3 lines total, and unsurprisingly, if the line to find is
the very first line the 'repeat with x = 1' case is way faster than using
'repeat for each' to rebuild the data.

So in a 'this should put a smile on Richard's face' frame of mind, I set
about writing a repeating test that moved the line to be found further and
further into the data until 'repeat with x = 1' and a single delete was no
longer faster than rebuilding the data.

Here are my results:

The line to be found is line  2021 of 100000
Single delete = 126 ms
repeat for each = 116 ms

For 100000 lines of data, if the line to be found is in the first 2.03% of
lines a single search and delete is faster than rebuilding the entire data.

The line to be found is line  641 of 10000
Single delete = 12 ms
repeat for each = 11 ms

For 10000 lines of data, if the line to be found is in the first 6.5% of
lines a single search and delete is faster than rebuilding the entire data.

The line to be found is line  201 of 1000
Single delete = 2 ms
repeat for each = 1 ms

For 1000 lines of data, if the line to be found is in the first 21% of lines
a single search and delete is faster than rebuilding the entire data.

Here is the script I used:

on mouseup
   put empty into msg

   put 0 into tPrefix --CHANGE START POINT HERE
   put 9997 into tSuffix --CHANGE TOTAL LINES

   repeat until tTotal2 < tTotal1
      put empty into tData
      repeat tPrefix times
         put "another speed test" & cr after tData
      end repeat
      put "the 1 line I'm looking for" & cr after tData
      put "and this line I don't want" & cr after tData
      put "and especially this line needs to be deleted" & cr after tData
      repeat tSuffix times
         put "What will be the result" & cr after tData
      end repeat
      put word 1 to -1 of tData into  tData
      put the number of lines of tData into tLineCount


      put tData into tData1
      --test 1

      put the millisec into tStart
      repeat with x = 1 to tLineCount
         if (line x tData contains "1") then
            delete line x to (x+2) of tData1
            exit repeat
         end if
      end repeat
      put word 1 to -1 of tData1 into tData1
      put the millisec into tEnd
      put tEnd - tStart into tTotal1



      --test 2
      put 3 into tSkip
      put empty into tData2

      put the millisec into tStart
      repeat for each line tLine in tData
         switch
            case (tSkip < 3)
               put tSkip +1 into tSkip
               break
            case (tLine contains "1")
               put 1 into tSkip
               break
            default
               put tLine & cr after tData2
         end switch
      end repeat
      put word 1 to -1 of tData2 into tData2
      put the millisec into tEnd
      put tEnd - tStart into tTotal2

      if (tData1 <> tData2)  then
         answer "Error"
         breakpoint
      end if

      put "The line to be found is line  " & (tPrefix + 1) & " of " &
(tPrefix + tSuffix + 3) & cr after msg
      put "Single delete = " & tTotal1 & " ms" & cr after msg
      put "repeat for each = " & tTotal2 & " ms" & cr & cr after msg

      put tPrefix + 10 into tPrefix --CHANGE STEP SIZE HERE
      put tSuffix - 10 into tSuffix --CHANGE STEP SIZE HERE

   end repeat
   put "For " & (tPrefix + Tsuffix + 3) & " lines of data, if the line to be
found is in the first " & (tPrefix/(tPrefix + Tsuffix + 3)*100) & "% of
lines a single search and delete is faster than rebuilding the entire data."
after msg
end mouseup



More information about the use-livecode mailing list