Making Revolution faster with really big arrays

Wed Apr 13 12:36:53 EDT 2005

Dennis,

I guess it's half of each, as I see it (and I misread some of this I 
think).
You only need sequential access to the lines and items, but random 
access would solve your problems. Random access is even faster than 
sequential, and can do everything sequential can.

 From your syntax example:

> access each line X in arrayX--initial setup of pointers and X value
> access each item Y in arrayY --initial setup of pointers and Y value
> repeat for number of lines of arrayX times --same as a repeat for each
>    put X & comma & Y & return after ArrayXY --merged array
>    next line X --puts the next line value in X
>    next item Y --if arrayY has fewer elements than arrayX, then empty 
> is supplied, could also put "End of String" in the result
> end repeat

In SQL:

SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY 
lineNumber, itemNumber;

Give your DB enough RAM to work with and this will fly. The fields will 
be indexed once- that's what you are basically saying by "access" ... 
"index". Then they'll be retrieved super fast and the sort should be 
trivial if the records are already in order in your database. You could 
work with a dozen tables at once if you wanted.

If your still committed to a scripted solution, could you post some of 
your actual code- even if the calculations are hidden? Are you using 
nested repeat for each? This group (myself included) has been known to 
tweak these sorts of things pretty well in the past when there is real 
code to work with...

- Brian

> Thanks Brian,
>
> I don't require random access to the data.  I only need sequential 
> access.  That is why the repeat for each operator works so fast --less 
> than a microsecond per data item.  I'm not going to match that with 
> anything other than RAM.
>
> Dennis
>
> On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:
>
>> Dennis,
>>
>> I have to agree with Pierre here. If you are looking for random 
>> access to many thousands of records taking up gigabytes of memory, a 
>> database engine is, IMO, the only logical choice.
>>
>> A simple MySQL/PostgreSQL/Valentina/etc database indexed by line 
>> number (or stock symbol) would be very fast.
>>
>> Without indexing your data or fitting all of it into random-access 
>> in-memory data structures, you're fighting a painful battle. If you 
>> algorithm is scaling out linearly, you'll just run too slow, and if 
>> your data size is doing the same you'll run out of memory. On the 
>> other hand, database engines can potentially handle _terabytes_ of 
>> data and give you random access in milliseconds. You simply won't 
>> beat that in Transcript.
>>
>> One thing you could consider if you don't want a whole database 
>> engine to deal with, is the feasibility of indexing the data yourself 
>> - which will give you some of the algorithmic benefits of a database 
>> engine. That is, make one pass where you store the offsets of each 
>> line in an index, and then use that to grab lines. Something like 
>> (untested):
>>
>> ## index the line starts and ends
>> put 1 into lineNumber
>> put 1 into charNum
>> put 1 into lineStarts[1]
>> repeat for each char c in tData
>>     if (c = return) then
>>        put (charNum - 1) into lineEnds[lineNumber]
>>        put (charNum + 1) into lineStarts[lineNumber + 1]
>>        add 1 to lineNumber
>>     end if
>>     add 1 to charNum
>> end repeat
>> if (c <> return) then put charNum into lineEnds[lineNumber]
>>
>> ## get line x via random char access
>> put char lineStarts[x] to lineEnds[x] of tData into lineX
>>
>> - Brian
>>
>>> Thanks Pierre,
>>>
>>> I considered that also.  A Database application would certainly 
>>> handle the amount of data, but they are really meant for finding and 
>>> sorting various fields, not for doing the kind of processing I am 
>>> doing.  The disk accessing would slow down the process.
>>>
>>> Dennis
>>>
>>> On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:
>>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>