Making Revolution faster with really big arrays

Wed Apr 13 13:23:26 EDT 2005

Brian,

Can you create a 10000 by 2500 by 100 array of random numbers (sparsely 
populated i.e. some values are null) in SQL?

Can you put the sum of two random accessed numbers (except if one is 
null then the sum is null) into a third random location?

How fast can you do this last thing 25000000 times in SQL.
This is the simplest of the things I am doing.
If you can do that in SQL in less than a minute, you've got my 
attention :-)

Dennis

On Apr 13, 2005, at 12:36 PM, Brian Yennie wrote:

> Dennis,
>
> I guess it's half of each, as I see it (and I misread some of this I 
> think).
> You only need sequential access to the lines and items, but random 
> access would solve your problems. Random access is even faster than 
> sequential, and can do everything sequential can.
>
> From your syntax example:
>
>> access each line X in arrayX--initial setup of pointers and X value
>> access each item Y in arrayY --initial setup of pointers and Y value
>> repeat for number of lines of arrayX times --same as a repeat for each
>>    put X & comma & Y & return after ArrayXY --merged array
>>    next line X --puts the next line value in X
>>    next item Y --if arrayY has fewer elements than arrayX, then empty 
>> is supplied, could also put "End of String" in the result
>> end repeat
>
> In SQL:
>
> SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY 
> lineNumber, itemNumber;
>
> Give your DB enough RAM to work with and this will fly. The fields 
> will be indexed once- that's what you are basically saying by "access" 
> ... "index". Then they'll be retrieved super fast and the sort should 
> be trivial if the records are already in order in your database. You 
> could work with a dozen tables at once if you wanted.
>
> If your still committed to a scripted solution, could you post some of 
> your actual code- even if the calculations are hidden? Are you using 
> nested repeat for each? This group (myself included) has been known to 
> tweak these sorts of things pretty well in the past when there is real 
> code to work with...
>
>
> - Brian
>
>
>> Thanks Brian,
>>
>> I don't require random access to the data.  I only need sequential 
>> access.  That is why the repeat for each operator works so fast 
>> --less than a microsecond per data item.  I'm not going to match that 
>> with anything other than RAM.
>>
>> Dennis
>>
>> On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:
>>
>>> Dennis,
>>>
>>> I have to agree with Pierre here. If you are looking for random 
>>> access to many thousands of records taking up gigabytes of memory, a 
>>> database engine is, IMO, the only logical choice.
>>>
>>> A simple MySQL/PostgreSQL/Valentina/etc database indexed by line 
>>> number (or stock symbol) would be very fast.
>>>
>>> Without indexing your data or fitting all of it into random-access 
>>> in-memory data structures, you're fighting a painful battle. If you 
>>> algorithm is scaling out linearly, you'll just run too slow, and if 
>>> your data size is doing the same you'll run out of memory. On the 
>>> other hand, database engines can potentially handle _terabytes_ of 
>>> data and give you random access in milliseconds. You simply won't 
>>> beat that in Transcript.
>>>
>>> One thing you could consider if you don't want a whole database 
>>> engine to deal with, is the feasibility of indexing the data 
>>> yourself - which will give you some of the algorithmic benefits of a 
>>> database engine. That is, make one pass where you store the offsets 
>>> of each line in an index, and then use that to grab lines. Something 
>>> like (untested):
>>>
>>> ## index the line starts and ends
>>> put 1 into lineNumber
>>> put 1 into charNum
>>> put 1 into lineStarts[1]
>>> repeat for each char c in tData
>>>     if (c = return) then
>>>        put (charNum - 1) into lineEnds[lineNumber]
>>>        put (charNum + 1) into lineStarts[lineNumber + 1]
>>>        add 1 to lineNumber
>>>     end if
>>>     add 1 to charNum
>>> end repeat
>>> if (c <> return) then put charNum into lineEnds[lineNumber]
>>>
>>> ## get line x via random char access
>>> put char lineStarts[x] to lineEnds[x] of tData into lineX
>>>
>>> - Brian
>>>
>>>> Thanks Pierre,
>>>>
>>>> I considered that also.  A Database application would certainly 
>>>> handle the amount of data, but they are really meant for finding 
>>>> and sorting various fields, not for doing the kind of processing I 
>>>> am doing.  The disk accessing would slow down the process.
>>>>
>>>> Dennis
>>>>
>>>> On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:
>>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>