Making Revolution faster with really big arrays

Brian Yennie briany at qldlearning.com
Wed Apr 13 20:32:41 EDT 2005


Dennis,

I through together a real rough test- just some tables with random 
values 0-1000.

SELECT (v10000 + v2500) FROM values10000,values2500 LIMIT 500000;

This ran for me in about 10-15 seconds off of a G4 XServe with 1GB 
memory. Note that I only had enough free memory to do chunks of 500,000 
so it looks like around a minute for all 2,500,000 combinations.

This with all of the database on-disk and no optimizations / 
tweaking... so it should be an upper limit.

How is the most recent scripted solution timing out for you?

- Brian

> Brian,
>
> Can you create a 10000 by 2500 by 100 array of random numbers 
> (sparsely populated i.e. some values are null) in SQL?
>
> Can you put the sum of two random accessed numbers (except if one is 
> null then the sum is null) into a third random location?
>
> How fast can you do this last thing 25000000 times in SQL.
> This is the simplest of the things I am doing.
> If you can do that in SQL in less than a minute, you've got my 
> attention :-)
>
> Dennis
>
> On Apr 13, 2005, at 12:36 PM, Brian Yennie wrote:
>
>> Dennis,
>>
>> I guess it's half of each, as I see it (and I misread some of this I 
>> think).
>> You only need sequential access to the lines and items, but random 
>> access would solve your problems. Random access is even faster than 
>> sequential, and can do everything sequential can.
>>
>> From your syntax example:
>>
>>> access each line X in arrayX--initial setup of pointers and X value
>>> access each item Y in arrayY --initial setup of pointers and Y value
>>> repeat for number of lines of arrayX times --same as a repeat for 
>>> each
>>>    put X & comma & Y & return after ArrayXY --merged array
>>>    next line X --puts the next line value in X
>>>    next item Y --if arrayY has fewer elements than arrayX, then 
>>> empty is supplied, could also put "End of String" in the result
>>> end repeat
>>
>> In SQL:
>>
>> SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY 
>> lineNumber, itemNumber;
>>
>> Give your DB enough RAM to work with and this will fly. The fields 
>> will be indexed once- that's what you are basically saying by 
>> "access" ... "index". Then they'll be retrieved super fast and the 
>> sort should be trivial if the records are already in order in your 
>> database. You could work with a dozen tables at once if you wanted.
>>
>> If your still committed to a scripted solution, could you post some 
>> of your actual code- even if the calculations are hidden? Are you 
>> using nested repeat for each? This group (myself included) has been 
>> known to tweak these sorts of things pretty well in the past when 
>> there is real code to work with...
>>
>>
>> - Brian
>>
>>
>>> Thanks Brian,
>>>
>>> I don't require random access to the data.  I only need sequential 
>>> access.  That is why the repeat for each operator works so fast 
>>> --less than a microsecond per data item.  I'm not going to match 
>>> that with anything other than RAM.
>>>
>>> Dennis
>>>
>>> On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:
>>>
>>>> Dennis,
>>>>
>>>> I have to agree with Pierre here. If you are looking for random 
>>>> access to many thousands of records taking up gigabytes of memory, 
>>>> a database engine is, IMO, the only logical choice.
>>>>
>>>> A simple MySQL/PostgreSQL/Valentina/etc database indexed by line 
>>>> number (or stock symbol) would be very fast.
>>>>
>>>> Without indexing your data or fitting all of it into random-access 
>>>> in-memory data structures, you're fighting a painful battle. If you 
>>>> algorithm is scaling out linearly, you'll just run too slow, and if 
>>>> your data size is doing the same you'll run out of memory. On the 
>>>> other hand, database engines can potentially handle _terabytes_ of 
>>>> data and give you random access in milliseconds. You simply won't 
>>>> beat that in Transcript.
>>>>
>>>> One thing you could consider if you don't want a whole database 
>>>> engine to deal with, is the feasibility of indexing the data 
>>>> yourself - which will give you some of the algorithmic benefits of 
>>>> a database engine. That is, make one pass where you store the 
>>>> offsets of each line in an index, and then use that to grab lines. 
>>>> Something like (untested):
>>>>
>>>> ## index the line starts and ends
>>>> put 1 into lineNumber
>>>> put 1 into charNum
>>>> put 1 into lineStarts[1]
>>>> repeat for each char c in tData
>>>>     if (c = return) then
>>>>        put (charNum - 1) into lineEnds[lineNumber]
>>>>        put (charNum + 1) into lineStarts[lineNumber + 1]
>>>>        add 1 to lineNumber
>>>>     end if
>>>>     add 1 to charNum
>>>> end repeat
>>>> if (c <> return) then put charNum into lineEnds[lineNumber]
>>>>
>>>> ## get line x via random char access
>>>> put char lineStarts[x] to lineEnds[x] of tData into lineX
>>>>
>>>> - Brian
>>>>
>>>>> Thanks Pierre,
>>>>>
>>>>> I considered that also.  A Database application would certainly 
>>>>> handle the amount of data, but they are really meant for finding 
>>>>> and sorting various fields, not for doing the kind of processing I 
>>>>> am doing.  The disk accessing would slow down the process.
>>>>>
>>>>> Dennis
>>>>>
>>>>> On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:
>>>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>



More information about the use-livecode mailing list