Making Revolution faster using dimensioned arrays
Dennis Brown
see3d at writeme.com
Mon Jul 4 22:33:15 EDT 2005
To all the speed freaks,
I know that I have pushed for faster array processing and have even
proposed an "Array sub-processor" as a possible solution. However,
after giving this much thought, I don't believe that a separate array
sub-processor is needed to solve the problem of speed. Transcript
could provide the speed needed for processing arrays in a simpler way.
If Transcript had an array declaration command that allowed the user
to fix the dimensions and data size/type for an array and limited the
"keys" to integer indexes, then high speed array processing could be
built-in.
Normally it would be a pain to have to dim all arrays. Arrays in
Transcript are sparsely populated and very general now --and could be
further extended to make them even better for many general purpose
solutions. However, there is a class of problems that need speed and
generally lend themselves to fixing the type and size. These
problems are quite common, but rarely seen implemented in Transcript
because of the slow array processing speed.
The key is that once an array is fixed in size and type, the runtime
does not need to do extensive checking of the data to determine the
operation code, and the keys do not have to be looked up in a hash
table to find the location of the element --just use the indexes
directly. The compiler can then generate the code for a much simpler
runtime operator. These declared arrays would run 10 to 100 times
faster than the type-less ones for the kind of operations that are
required by image processing, AI, statistics, and many other
applications. This is the main reason array processing runs so much
faster in other languages.
Having fast array structures then opens up the possibility for a
larger set of array operators or even an efficient way to pass the
array pointers directly to an external. The links between the rest
of the data structures in Transcript and these dimensioned arrays
become seamless.
Conceptually, the implementation of a dimensioned array would be as
if it were a single string variable. In other words, just a
contiguous chunk of memory that you can index into based on the
supplied dimensions. We already have the concept of a pointer in
Transcript. When we say "get char (expression) of variable",
"(expression)" is a pointer. That expression can be the index into
a conceptual n dimensional character array. However, at present we
are limited to directly accessing only a single byte. We could say
"get char (expression) to (expression+4) of variable", to get 32 bits
of information, but even if we know that it is supposed to represent
a 32 bit floating point number, we have no way of telling Transcript
that, so it will just hand us back 4 characters that could at best be
translated into a 4 digit (including decimal point) number like
3.14. However, if we told the compiler that we want our variable to
have 1000 rows by 1000 columns of 4 byte words, then the "get
variable[25,64]" could get a 4 byte string of characters in lightning
speed. We could say that If we initialized our array with strings,
we would get strings back. If we initialized it with numbers, we
would get numbers back --which is very Transcript like, or we could
just specify the type. The point is that a single dimensioned array,
like a single variable, can hold only one data type at a time.
The advantage of such an arrangement is not only speed, but also
compactness. If we are interested in speed it is likely that we have
a large quantity of data. Arrays in Transcript today have a large
overhead due to the key structure. Dimensioned arrays could easily
take half the space or even less. The syntax could be very simple like:
global myArray[1000,1000]=0 --numeric array
local myArray[1000,1000,5]=false --Boolean array
local myArray[1000]="" --empty character array
local myArray[100,300]="abcd" --4 character word array
Dennis
More information about the use-livecode
mailing list