Making Revolution faster with really big arrays
Dennis Brown
see3d at writeme.com
Tue Apr 12 19:42:18 EDT 2005
On Apr 12, 2005, at 7:37 PM, Dennis Brown wrote:
Frank,
I just copied this from the Rev IDE help info:
A note about entries designated as "Unlimited":
Since each open stack file resides completely in memory, Revolution
stacks (and all structures within a stack) are effectively limited by
available memory and by Revolution's total address space of 4G
(4,294,967,296 bytes) on 32-bit systems, or 16P
(18,446,744,073,709,551,616 bytes) on 64-bit systems.
The 64KB limit on lines is also well within my data size.
Anyway, with my scheme of keeping the data in 40MB files that I only
need several of in memory at a time, means memory is no longer my
limiting factor. All I need is speed...
If I can't do it in Rev, then I will try RB, or write portions in each.
I have years of experience with X languages, years with microprocessor
machine code, and nothing in between. So everything else is a learning
curve, and I have to have some results from this project in a couple of
weeks. My gripe is that Rev is only lacking a small enhancement to
open up the possibility of quite high speed data processing, it is a
crying shame.
Dennis
On Apr 12, 2005, at 6:27 PM, Frank D. Engel, Jr. wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> That is only significant if Rev takes advantage of the 64-bit address
>> space, which I seriously doubt. Your Rev process will still be
>> limited to 2GB of addressing space, regardless of how much RAM is in
>> the system. Until they release a 64-bit version of Rev, of course...
>>
>> If your task is that processor-intensive and your data set that
>> large, you should consider a lower-level language like Pascal or Ada.
>> A scripting language, no matter how fast it is, is not ideal for
>> such intensive operations on large data sets.
>>
>>
>> On Apr 12, 2005, at 5:30 PM, Dennis Brown wrote:
>>
>>> Thanks Frank,
>>>
>>> Actually I have a 3.5GB 64 bit G5 machine that can handle that much
>>> data, and I could add a couple more gig if I needed to. It crashes
>>> when I get less than 1GB into RAM (I can monitor the number of free
>>> pages of RAM). I tried processing it like you suggest. However, at
>>> the speed it was going, it was going to be 4 or 5 days to get the
>>> first pass of my data processed. That is because if you specify a
>>> line or item chunk in a big array, Rev counts separators from the
>>> beginning to find the spot you want each time, even if you just want
>>> the next line. That means on the average, you have processed the
>>> array thousands of more times than the single pass repeat for each
>>> takes. The way I wrote it, it only required about two hours for the
>>> initial pass, and about two minutes for single passes through one
>>> data item in the array. However, now I need to process more than
>>> one data item at a time, and that means I can use the repeat for
>>> each on only one item and I will have to use the chunk expressions
>>> for the others. That will slow me back down to many days per pass,
>>> and I have hundreds of passes to do --not very interactive! See you
>>> in a few years...
>>>
>>> Dennis
>>>
>>>
>>> On Apr 12, 2005, at 5:04 PM, Frank D. Engel, Jr. wrote:
>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Rev's arrays are associative. When using an array with an index
>>>> like [X, Y, Z], you are really saying, make a string whose contents
>>>> are X, Y, and Z separated by commas, then use that as the index for
>>>> the array. These array indexes take up memory, along with your
>>>> data. In fact, depending on what type of data you are trying to
>>>> process, they likely take up more. Even without the overhead of
>>>> the structures used to represent the arrays, your array will likely
>>>> take up well over 2GB of RAM. On a 32-bit system, you are normally
>>>> limited to either 2GB or 3GB of memory per process (almost always
>>>> 2GB, but some Windows versions -- mostly server versions -- can be
>>>> configured for 3GB per process), so that array would take more
>>>> memory than all of your data PLUS Revolution PLUS your stack(s)
>>>> PLUS some code used by the runtime libraries from the OS ... you
>>>> get the idea.
>>>>
>>>> You'll never be able to fit that entire array into memory *as an
>>>> array* in Rev.
>>>>
>>>> Have you considered loading it into a single string and parsing the
>>>> data inline while managing it in your code?
>>>>
>>>> Try something like:
>>>>
>>>> put URL "file:/path/to/MyFile.txt" into x
>>>>
>>>> Then parse the data from x:
>>>>
>>>> put word 1 of item 2 of line 6 of x into y
>>>>
>>>> And so on...
>>>>
>>>>
More information about the use-livecode
mailing list