Making Revolution faster with really big arrays

Dennis Brown see3d at writeme.com
Tue Apr 12 19:42:18 EDT 2005


On Apr 12, 2005, at 7:37 PM, Dennis Brown wrote:

Frank,

I just copied this from the Rev IDE help info:

A note about entries designated as "Unlimited":
Since each open stack file resides completely in memory, Revolution 
stacks (and all structures within a stack) are effectively limited by 
available memory and by Revolution's total address space of 4G 
(4,294,967,296 bytes) on 32-bit systems, or 16P 
(18,446,744,073,709,551,616 bytes) on 64-bit systems.

The 64KB limit on lines is also well within my data size.

Anyway, with my scheme of keeping the data in 40MB files that I only 
need several of in memory at a time, means memory is no longer my 
limiting factor.  All I need is speed...

If I can't do it in Rev, then I will try RB, or write portions in each. 
  I have years of experience with X languages, years with microprocessor 
machine code, and nothing in between.  So everything else is a learning 
curve, and I have to have some results from this project in a couple of 
weeks.  My gripe is that Rev is only lacking a small enhancement to 
open up the possibility of quite high speed data processing, it is a 
crying shame.

Dennis

On Apr 12, 2005, at 6:27 PM, Frank D. Engel, Jr. wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> That is only significant if Rev takes advantage of the 64-bit address 
>> space, which I seriously doubt.  Your Rev process will still be 
>> limited to 2GB of addressing space, regardless of how much RAM is in 
>> the system.  Until they release a 64-bit version of Rev, of course...
>>
>> If your task is that processor-intensive and your data set that 
>> large, you should consider a lower-level language like Pascal or Ada. 
>>  A scripting language, no matter how fast it is, is not ideal for 
>> such intensive operations on large data sets.
>>
>>
>> On Apr 12, 2005, at 5:30 PM, Dennis Brown wrote:
>>
>>> Thanks Frank,
>>>
>>> Actually I have a 3.5GB 64 bit G5 machine that can handle that much 
>>> data, and I could add a couple more gig if I needed to.  It crashes 
>>> when I get less than 1GB into RAM (I can monitor the number of free 
>>> pages of RAM).  I tried processing it like you suggest.  However, at 
>>> the speed it was going, it was going to be 4 or 5 days to get the 
>>> first pass of my data processed.  That is because if you specify a 
>>> line or item chunk in a big array, Rev counts separators from the 
>>> beginning to find the spot you want each time, even if you just want 
>>> the next line.  That means on the average, you have processed the 
>>> array thousands of more times than the single pass repeat for each 
>>> takes.  The way I wrote it, it only required about two hours for the 
>>> initial pass, and about two minutes for single passes through one 
>>> data item in the array.  However, now I need to process more than 
>>> one data item at a time, and that means I can use the repeat for 
>>> each on only one item and I will have to use the chunk expressions 
>>> for the others.  That will slow me back down to many days per pass, 
>>> and I have hundreds of passes to do --not very interactive!  See you 
>>> in a few years...
>>>
>>> Dennis
>>>
>>>
>>> On Apr 12, 2005, at 5:04 PM, Frank D. Engel, Jr. wrote:
>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Rev's arrays are associative.  When using an array with an index 
>>>> like [X, Y, Z], you are really saying, make a string whose contents 
>>>> are X, Y, and Z separated by commas, then use that as the index for 
>>>> the array.  These array indexes take up memory, along with your 
>>>> data.  In fact, depending on what type of data you are trying to 
>>>> process, they likely take up more.  Even without the overhead of 
>>>> the structures used to represent the arrays, your array will likely 
>>>> take up well over 2GB of RAM.  On a 32-bit system, you are normally 
>>>> limited to either 2GB or 3GB of memory per process (almost always 
>>>> 2GB, but some Windows versions -- mostly server versions -- can be 
>>>> configured for 3GB per process), so that array would take more 
>>>> memory than all of your data PLUS Revolution PLUS your stack(s) 
>>>> PLUS some code used by the runtime libraries from the OS ... you 
>>>> get the idea.
>>>>
>>>> You'll never be able to fit that entire array into memory *as an 
>>>> array* in Rev.
>>>>
>>>> Have you considered loading it into a single string and parsing the 
>>>> data inline while managing it in your code?
>>>>
>>>> Try something like:
>>>>
>>>> put URL "file:/path/to/MyFile.txt" into x
>>>>
>>>> Then parse the data from x:
>>>>
>>>> put word 1 of item 2 of line 6 of x into y
>>>>
>>>> And so on...
>>>>
>>>>


More information about the use-livecode mailing list