Making Revolution faster with really big arrays

Frank D. Engel, Jr. fde101 at fjrhome.net
Tue Apr 12 18:27:17 EDT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

That is only significant if Rev takes advantage of the 64-bit address 
space, which I seriously doubt.  Your Rev process will still be limited 
to 2GB of addressing space, regardless of how much RAM is in the 
system.  Until they release a 64-bit version of Rev, of course...

If your task is that processor-intensive and your data set that large, 
you should consider a lower-level language like Pascal or Ada.  A 
scripting language, no matter how fast it is, is not ideal for such 
intensive operations on large data sets.


On Apr 12, 2005, at 5:30 PM, Dennis Brown wrote:

> Thanks Frank,
>
> Actually I have a 3.5GB 64 bit G5 machine that can handle that much 
> data, and I could add a couple more gig if I needed to.  It crashes 
> when I get less than 1GB into RAM (I can monitor the number of free 
> pages of RAM).  I tried processing it like you suggest.  However, at 
> the speed it was going, it was going to be 4 or 5 days to get the 
> first pass of my data processed.  That is because if you specify a 
> line or item chunk in a big array, Rev counts separators from the 
> beginning to find the spot you want each time, even if you just want 
> the next line.  That means on the average, you have processed the 
> array thousands of more times than the single pass repeat for each 
> takes.  The way I wrote it, it only required about two hours for the 
> initial pass, and about two minutes for single passes through one data 
> item in the array.  However, now I need to process more than one data 
> item at a time, and that means I can use the repeat for each on only 
> one item and I will have to use the chunk expressions for the others.  
> That will slow me back down to many days per pass, and I have hundreds 
> of passes to do --not very interactive!  See you in a few years...
>
> Dennis
>
>
> On Apr 12, 2005, at 5:04 PM, Frank D. Engel, Jr. wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Rev's arrays are associative.  When using an array with an index like 
>> [X, Y, Z], you are really saying, make a string whose contents are X, 
>> Y, and Z separated by commas, then use that as the index for the 
>> array.  These array indexes take up memory, along with your data.  In 
>> fact, depending on what type of data you are trying to process, they 
>> likely take up more.  Even without the overhead of the structures 
>> used to represent the arrays, your array will likely take up well 
>> over 2GB of RAM.  On a 32-bit system, you are normally limited to 
>> either 2GB or 3GB of memory per process (almost always 2GB, but some 
>> Windows versions -- mostly server versions -- can be configured for 
>> 3GB per process), so that array would take more memory than all of 
>> your data PLUS Revolution PLUS your stack(s) PLUS some code used by 
>> the runtime libraries from the OS ... you get the idea.
>>
>> You'll never be able to fit that entire array into memory *as an 
>> array* in Rev.
>>
>> Have you considered loading it into a single string and parsing the 
>> data inline while managing it in your code?
>>
>> Try something like:
>>
>> put URL "file:/path/to/MyFile.txt" into x
>>
>> Then parse the data from x:
>>
>> put word 1 of item 2 of line 6 of x into y
>>
>> And so on...
>>
>>
>> On Apr 12, 2005, at 4:36 PM, Dennis Brown wrote:
>>
>>> Hi all,
>>>
>>> I just joined this list.  What a great resource for sharing ideas 
>>> and getting help.
>>>
>>> I am actively writing a bunch of Transcript code to sequentially 
>>> process some very large arrays.  I had to figure out how to handle a 
>>> gig of data.  At first I tried to load the file data into a data 
>>> array[X,Y,Z] but it takes a while to load and processes for random 
>>> access and it takes a lot of extra space for the structure.  I also 
>>> could never get all the data loaded in without crashing Revolution 
>>> and my whole system (yes, I have plenty of extra RAM).
>>>
>>> The scheme I ended up with is based on the fact that the only fast 
>>> way I could find to process a large amount of data is with the 
>>> repeat for each control structure.  I broke my data into a bunch of 
>>> 10,000 line by 2500 item arrays.  Each one holds a single data item 
>>> (in this case it relates to stock market data).  That way I can 
>>> process a single data item in one sequential pass through the array 
>>> (usually building another array in the process).  I was impressed at 
>>> how fast it goes for these 40MB files.  However, this technique only 
>>> covers a subset of the type of operations I need to do.  The problem 
>>> is that you can only specify a single item at a time to work with 
>>> the repeat for each.  In many cases, I need to have two or more data 
>>> items available for the calculations.  I have to pull a few rabbits 
>>> out of my hat and jump through a lot of hoops to do this and still 
>>> go faster than a snail.  That is a crying shame.  I believe (but 
>>> don't know for sure) that all the primitive operations are in the 
>>> runtime to make it possible to do this in a simple way if we could 
>>> just access them from the compiler. So I came up with an idea for a 
>>> proposed language extension.  I put the idea in Bugzilla yesterday, 
>>> then today, I thought I should ask others if they liked the idea, 
>>> had a better idea, or could help me work around not having this 
>>> feature in the mean time, since I doubt I would see it implemented 
>>> in my lifetime based on the speed I see things getting addressed in 
>>> the Bugzilla list.
>>>
>>> The Idea is to break apart the essential functional elements of the 
>>> repeat for each control to allow more flexibility.  This sample has 
>>> a bit more refinement than what I posted yesterday in Bugzilla.
>>>
>>> The new keyword would be "access" , but could be something else.
>>>
>>> An example of the use of the new keywords syntax would be:
>>>
>>> access each line X in arrayX--initial setup of pointers and X value
>>> access each item Y in arrayY --initial setup of pointers and Y value
>>> repeat for number of lines of arrayX times --same as a repeat for 
>>> each
>>>    put X & comma & Y & return after ArrayXY --merged array
>>>    next line X --puts the next line value in X
>>>    next item Y --if arrayY has fewer elements than arrayX, then 
>>> empty is supplied, could also put "End of String" in the result
>>> end repeat
>>>
>>> Another advantage of this syntax is that it provides for more 
>>> flexibility in structure of loops.  You could repeat forever, then 
>>> exit repeat when you run out of values (based on getting an empty 
>>> back).  The possibilities for high speed sequential access data 
>>> processing are much expanded which opens up more possibilities for 
>>> Revolution.
>>>
>>> I would love to get your feedback or other ideas about solving this 
>>> problem.
>>>
>>> Dennis
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>>
>> - -----------------------------------------------------------
>> Frank D. Engel, Jr.  <fde101 at fjrhome.net>
>>
>> $ ln -s /usr/share/kjvbible /usr/manual
>> $ true | cat /usr/manual | grep "John 3:16"
>> John 3:16 For God so loved the world, that he gave his only begotten 
>> Son, that whosoever believeth in him should not perish, but have 
>> everlasting life.
>> $
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.2.4 (Darwin)
>>
>> iD8DBQFCXDfm7aqtWrR9cZoRAnz6AKCMKYLJsg+P7IO3z+2MRHdEgTrjiQCeIS0s
>> T8tEaGjSTychxi01VZJKQVw=
>> =ltcj
>> -----END PGP SIGNATURE-----
>>
>>
>>
>> ___________________________________________________________
>> $0 Web Hosting with up to 200MB web space, 1000 MB Transfer
>> 10 Personalized POP and Web E-mail Accounts, and much more.
>> Signup at www.doteasy.com
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>
- -----------------------------------------------------------
Frank D. Engel, Jr.  <fde101 at fjrhome.net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten 
Son, that whosoever believeth in him should not perish, but have 
everlasting life.
$
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFCXEtG7aqtWrR9cZoRAu27AJ0esoARVdEKDEi0qdlmLTFW+jA44ACfXx5D
icNFddbPGBu6YYrKZz1Jyuo=
=ALsS
-----END PGP SIGNATURE-----



___________________________________________________________
$0 Web Hosting with up to 200MB web space, 1000 MB Transfer
10 Personalized POP and Web E-mail Accounts, and much more.
Signup at www.doteasy.com



More information about the use-livecode mailing list