Deleting Data Woefully Slow

Brian Yennie briany at qldlearning.com
Thu Mar 25 10:35:23 EDT 2010


I'd be surprised if you could do much better. A tad maybe, somehow, but I don't think there is another hidden Rev function that will work any further magic. If you need to make another major speed jump, I'm afraid you would need to look at your data itself -- can you index it, can you operate on a subset at a time, is it time for a database / new data structure, etc. Reading text line by line is eventually going to hit a wall, and algorithmically can only scale one way (and that's proportional to the amount of data). I realize it's not always practical to rework things at that level.

> On Thu, Mar 25, 2010 at 3:51 PM, Kay C Lan <lan.kc.macmail at gmail.com> wrote:
> 
>> 
>> But something Brian wrote has got me thinking and testing. Instead of
>> 'delete lines 12345 to 12347...' I'm looking at put Lines 1 to 12344... & cr
>> & lines 12348 to -1.... ' Fingers crossed.
>> \
>> 
> Not that fruitful, still slow.
> 
> Test 1 uses 'delete lines x to (x+2) of tData1
> 
> Test 2 uses ' put line 1 to (x-1) of tData2 & cr & line (x+3) to -1 of
> tData2 into tData2.
> 
> Test 3 uses a simple repeat with x = and then skips lines.
> 
> Test 4 is the 'repeat for each' that more closely resembles what I'm
> actually using to skip lines. Even with the switch statement it is still
> significantly faster to recreate 99.9% of your data than to delete 0.1%
> 
> If anyone thinks they can delete data faster than my Test 4 I'd sure
> appreciate some pointers :-)
> 
> MBP 2.16GHz, 2 GB RAM, OS x.6.2, Rev Studio 4.0.0 Build 950
> 
> Removing 0.1% of data in 50000 Lines
> In a 50000 repeat loop
> Delete line x to x+2 = 11478 ms
> put line 1 to (x-2) = 11595 ms
> repeat with x...skip lines = 23010 ms
> repeat for each = 37 ms --create 99.9%
> 
> on mouseup
>  put 50 into tHowMany
>  put tHowMany * 1000 into tRepeats
>  --because Mark Wieder doesn't like random;-)
>  repeat tHowMany times
>     put "a" & cr after tData
>     repeat with x = 1 to 999
>        put x & cr after tData
>     end repeat
>  end repeat
> 
>  put tData into tData1
>  --test 1
> 
>  put the millisec into tStart
>  repeat with x = the number of lines of tData down to 1
>     if (line x tData contains "a") then
>        delete line x to (x+2) of tData1
>     end if
>  end repeat
>  put word 1 to -1 of tData1 into tData1
>  put the millisec into tEnd
>  put tEnd - tStart into tTotal1
> 
> 
>  put tData into tData2
>  --test 2
> 
>  put the millisec into tStart
>  repeat with x = the number of lines of tData down to 1
>     if (line x tData contains "a") then
>        put line 1 to (x-1) of tData2 & cr & line (x+3) to -1 of tData2
> into tData2
>     end if
>  end repeat
>  put word 1 to -1 of tData2 into tData2
>  put the millisec into tEnd
>  put tEnd - tStart into tTotal2
> 
>  if (tData1 <> tData2) then
>     answer "Error"
>     breakpoint
>  end if
> 
> 
>  --test 3
> 
>  put the millisec into tStart
>  repeat with x = 1 to the number of lines of tData
>     if (line x tData contains "a") then
>        put x + 2 into x
>     else
>        put line x of tData & cr after tData3
>     end if
>  end repeat
>  put word 1 to -1 of tData3 into tData3
>  put the millisec into tEnd
>  put tEnd - tStart into tTotal3
> 
>  if (tData1 <> tData2) or (tData1 <> tData3) then
>     answer "Error"
>     breakpoint
>  end if
> 
> 
>  --test 4
>  put 3 into tSkip
> 
>  put the millisec into tStart
>  repeat for each line tLine in tData
>     switch
>        case (tSkip < 3)
>           put tSkip +1 into tSkip
>           break
>        case (tLine contains "a")
>           put 1 into tSkip
>           break
>        default
>           put tLine & cr after tData4
>     end switch
>  end repeat
>  put word 1 to -1 of tData4 into tData4
>  put the millisec into tEnd
>  put tEnd - tStart into tTotal4
> 
>  if (tData1 <> tData2) or (tData1 <> tData3)  or (tData1 <> tData4) then
>     answer "Error"
>     breakpoint
>  end if
> 
>  put "Removing 0.1% of data in " & tRepeats & " Lines" & cr into msg
>  put "In a " & tRepeats & " repeat loop" & cr after msg
>  put "Delete line x to x+2 = " & tTotal1 & " ms" & cr after msg
>  put "put line 1 to (x-2) = " & tTotal2 & " ms" & cr after msg
>  put "repeat with x...skip lines = " & tTotal3 & " ms" & cr after msg
>  put "repeat for each = " & tTotal4 & " ms" after msg
> end mouseup



More information about the use-livecode mailing list