Deleting Data Woefully Slow

Kay C Lan lan.kc.macmail at gmail.com
Thu Mar 25 02:01:15 EDT 2010


On Thu, Mar 25, 2010 at 12:33 PM, Mark Wieder <mwieder at ahsoftware.net>wrote:

>
> I'm actually more concerned that tests 9, 10, and 11 all rely on the
> *content* of lines containing random data. I think you should be using
> the line number mod 10 rather than the line content to provide any
> meaningful statistics. There's no way you can fill random numbers into
> lines of a variable and expect with any certainty that the number "1"
> will appear in 10% of the lines.
>
>
If the numbers were remotely close, ie if I wanted to prove it was faster to
create 99% than delete 1%, then yes, I totally agree, but the variation in
my random 10% is insignificant compared to the slowness of using 'delete
line x', especially on really large data sets.


> Also, in tests 9, 10, and 11 you should empty the variable before
> adding more data to it in order to provide better comparisons.
> Otherwise the variable will be continuing to increase and scaling will
> throw off the values.
>
> tData never changes.
tData1 is empty from Test 8so should not effect Test 9.
tData2 is created at Test 10 so is empty to start with, same for tData3 and
Test 11.
tLine in Test9 will initially be a single digit, only to be replaced with a
single digit.
tCounter will be the number 1 at the start of Test 10 - the exact number
that's about to be put into it, whilst at the start of Test 11 tCounter will
be 5000 (or 50000) again the exact number that's about to be put into it.

Yes I could empty these but I don't see they match your statement 'the
variable will be continuing to increase'? Memory, now that's another issue,
and maybe I should empty the variables prior to the next test, but this is
how I got here in the first place. I thought Rev was struggling with the
HUGE data sets I was using and thought it would help if I deleted lines in a
variable that I knew were invalid, what I've discovered though is the exact
opposite, Rev will happily create a 117 MB variable, in addition to the 130
MB variable it already is dealing with, but struggle to whittle a single 130
MB variable down to 117 MB.

And just to prove the point, I moved Test 9 to the very end, after all the
other variables were full, so it was supposedly handicapped, and got for
50000 repeats:

Create 90% - repeat for each = 26 ms --Test 9 run last
Create 90% - repeat with x = 33191 ms
Delete 10% = 19848 ms

absolutely no change for Test 9.



> Test 9 isn't quite a fair comparison (or is, depending on your point
> of view): if you want to prove that "repeat for each" is faster than
> "repeat with" it's fine (there's about a tenfold speed increase).
>

What I'm after is the fastest way to take a HUGE amount of data and reduce
it by roughly 5-10%. The repeat for each code I supplied seems to do that,
if anyone has any other code that is faster PLEASE provide. As I said at the
beginning of this thread I'm dealing with two nested repeat loops each
dealing with 1.4 million cycles!! Having something tenfold slower is NOT
what I'm after but is what I was seeing because I was using delete.



More information about the use-livecode mailing list