Making Revolution faster with really big arrays

Dennis Brown see3d at writeme.com
Tue Apr 12 22:36:03 EDT 2005


Thanks Alex,

This is quite an imaginative suggestion.  It would work great if I just 
had two lists, but I have two arrays 10,000 lines with 2500 items in 
each line.  It would be a lot more complex to combine them into a 
single array (I need to pair the items together).  But it could be done 
with a lot more shuffling.   Your timings point out the main problem 
with the chunk counting method used in Rev for large arrays --that the 
time it takes to find a line is proportional to the character size of 
the array up to that line, so access to higher line numbers takes 
longer than lower line numbers.  It takes twice as long to find line 2 
as line 1, and twice as long to find line 4 as line 2 etc.  It becomes 
an exponential instead of a linear function.  I just tried some tests 
and found what it takes to do 10 million operations in a optimal repeat 
loop, data value=1:

75 sec create and store a value in an array[i]
35 sec read indexed value from an array[i]
6 sec read "each" element  from an array
11 sec build a line delimited array
7 sec read "each" line from the line delimited array
 >5000 sec read line i from the line delimited array --got tired of 
waiting

line delimited arrays are better for reading and saving in files 
because it is a one step fast operation.  Indexed arrays require 
unpacking and repacking into a format for saving --more code and more 
time, but not out of the question.

I will study your suggestion and try some different things.

Thanks,
Dennis


On Apr 12, 2005, at 9:03 PM, Alex Tweedly wrote:

> In the meantime, here is a possible workaround for you.
>
> I wrote most of this email, and decided it looked so outlandish you 
> wouldn't believe it (and no-one else would either). So I had to tidy 
> up the code and verify the timings.  First the timing result - then 
> the suggestion
>
> Basic test case: fill two variables each with N random numbers (one 
> per line), produce as a result the summed list
>   (though the actual operation could be anything).
>
>            N    1000  2000  5000 10,000 20,000 50,000
> Simple method       9    37   188    696   3343  17831
> Suggestion         14    31    78    159    320    814
>
> As you can see, for small sizes, the overhead is makes this take 
> (marginally) longer, but for large cases (where it matters) the better 
> scaling makes it much better.
>
> Simple method :
> look through one list, using the corresponding "line y" from the other
>
>  put empty into tResult
>  put the millisecs into t1
>  put 1 into y
>  repeat for each line XL in Xlist
>    put XL + line y of Ylist & cr after tResult
>    add 1 to y
>  end repeat
>  put the number of lines in tResult && line 1 of tResult && the last 
> line of tResult && the millisecs-t1 & cr after field "Field"
>
> Suggestion:
> create a combined list, prefixing each line from list 1 with 
> 1,3,5,7,... and each line from list 2 with 2,4,6,8,...
> sort (numeric) this combined list
> process the combined list, alternately storing and adding
>    (note this generalizes to quite easily to handle more than two 
> lists, and extracting the data at each turn)
>
>  put empty into tResult
>  put empty into tInter
>  put the millisecs into t1
>  put 1 into y
>  repeat for each line XL in Xlist
>    put y, XL & cr after tInter
>    add 2 to y
>  end repeat
>  put 2 into y
>  repeat for each line YL in Ylist
>    put y, YL & cr after tInter
>    add 2 to y
>  end repeat
>  sort tInter numeric
>  put 1 into y
>  repeat for each line L in tInter
>    switch y
>    case 1
>      put item 2 of L into t
>      put 2 into y
>      break
>    case 2
>      put (item 2 of L)+t & cr after tResult
>      put 1 into y
>    end switch
>  end repeat
>  put the number of lines in tResult && line 1 of tResult && the last 
> line of tResult && the millisecs-t1 & cr after field "Field"
>
> caveat: this suggestion manipulates the data more, so it might not be 
> so effective if the lines in question were longer than this test case 
> - but a quick test  (put random(C) && "xxxxxxxxxxxxxxxxxxxxxxx" into 
> each line) suggests it won't be too bad.
>
> -- 
> Alex Tweedly       http://www.tweedly.net
>
>
>
> -- 
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.0.308 / Virus Database: 266.9.6 - Release Date: 11/04/2005
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list