Making Revolution faster with really big arrays

Dennis Brown see3d at writeme.com
Tue Apr 12 16:36:02 EDT 2005


Hi all,

I just joined this list.  What a great resource for sharing ideas and 
getting help.

I am actively writing a bunch of Transcript code to sequentially 
process some very large arrays.  I had to figure out how to handle a 
gig of data.  At first I tried to load the file data into a data 
array[X,Y,Z] but it takes a while to load and processes for random 
access and it takes a lot of extra space for the structure.  I also 
could never get all the data loaded in without crashing Revolution and 
my whole system (yes, I have plenty of extra RAM).

The scheme I ended up with is based on the fact that the only fast way 
I could find to process a large amount of data is with the repeat for 
each control structure.  I broke my data into a bunch of 10,000 line by 
2500 item arrays.  Each one holds a single data item (in this case it 
relates to stock market data).  That way I can process a single data 
item in one sequential pass through the array (usually building another 
array in the process).  I was impressed at how fast it goes for these 
40MB files.  However, this technique only covers a subset of the type 
of operations I need to do.  The problem is that you can only specify a 
single item at a time to work with the repeat for each.  In many cases, 
I need to have two or more data items available for the calculations.  
I have to pull a few rabbits out of my hat and jump through a lot of 
hoops to do this and still go faster than a snail.  That is a crying 
shame.  I believe (but don't know for sure) that all the primitive 
operations are in the runtime to make it possible to do this in a 
simple way if we could just access them from the compiler. So I came up 
with an idea for a proposed language extension.  I put the idea in 
Bugzilla yesterday, then today, I thought I should ask others if they 
liked the idea, had a better idea, or could help me work around not 
having this feature in the mean time, since I doubt I would see it 
implemented in my lifetime based on the speed I see things getting 
addressed in the Bugzilla list.

The Idea is to break apart the essential functional elements of the 
repeat for each control to allow more flexibility.  This sample has a 
bit more refinement than what I posted yesterday in Bugzilla.

The new keyword would be "access" , but could be something else.

An example of the use of the new keywords syntax would be:

access each line X in arrayX--initial setup of pointers and X value
access each item Y in arrayY --initial setup of pointers and Y value
repeat for number of lines of arrayX times --same as a repeat for each
    put X & comma & Y & return after ArrayXY --merged array
    next line X --puts the next line value in X
    next item Y --if arrayY has fewer elements than arrayX, then empty 
is supplied, could also put "End of String" in the result
end repeat

Another advantage of this syntax is that it provides for more 
flexibility in structure of loops.  You could repeat forever, then exit 
repeat when you run out of values (based on getting an empty back).  
The possibilities for high speed sequential access data processing are 
much expanded which opens up more possibilities for Revolution.

I would love to get your feedback or other ideas about solving this 
problem.

Dennis



More information about the use-livecode mailing list