Making Revolution faster with really big arrays

Dennis Brown see3d at writeme.com
Wed Apr 13 21:15:16 EDT 2005


Brian,

I am running on a 1.8 GHz single processor G5 Tower.
With my data put into an array it takes about 9 sec to access each 
value.
With my data in a variable, it takes about 1.5 sec to access each value 
sequentially.
I don't have the new approach written yet (got side tracked on the 
graphing routines), but if I did it the slow way (no split),
it would take about 20 sec to load the array, then 10 sec to access two 
arrays or 30 sec total.
Of course if I could access both arrays at the same time sequentially, 
it would take 3 seconds.
Sounds like the scripting solution is the faster way to go in the 
normal case.

Thanks for running the test.  It finally gives me a basis for deciding 
which way to go.  It is clear that without better sequential
access methods, the difference is far less than the order of magnitude 
difference I was expecting.

Dennis

On Apr 13, 2005, at 8:32 PM, Brian Yennie wrote:

> Dennis,
>
> I through together a real rough test- just some tables with random 
> values 0-1000.
>
> SELECT (v10000 + v2500) FROM values10000,values2500 LIMIT 500000;
>
> This ran for me in about 10-15 seconds off of a G4 XServe with 1GB 
> memory. Note that I only had enough free memory to do chunks of 
> 500,000 so it looks like around a minute for all 2,500,000 
> combinations.
>
> This with all of the database on-disk and no optimizations / 
> tweaking... so it should be an upper limit.
>
> How is the most recent scripted solution timing out for you?
>
> - Brian
>
>> Brian,
>>
>> Can you create a 10000 by 2500 by 100 array of random numbers 
>> (sparsely populated i.e. some values are null) in SQL?
>>
>> Can you put the sum of two random accessed numbers (except if one is 
>> null then the sum is null) into a third random location?
>>
>> How fast can you do this last thing 25000000 times in SQL.
>> This is the simplest of the things I am doing.
>> If you can do that in SQL in less than a minute, you've got my 
>> attention :-)
>>
>> Dennis
>>
>> On Apr 13, 2005, at 12:36 PM, Brian Yennie wrote:
>>
>>> Dennis,
>>>
>>> I guess it's half of each, as I see it (and I misread some of this I 
>>> think).
>>> You only need sequential access to the lines and items, but random 
>>> access would solve your problems. Random access is even faster than 
>>> sequential, and can do everything sequential can.
>>>
>>> From your syntax example:
>>>
>>>> access each line X in arrayX--initial setup of pointers and X value
>>>> access each item Y in arrayY --initial setup of pointers and Y value
>>>> repeat for number of lines of arrayX times --same as a repeat for 
>>>> each
>>>>    put X & comma & Y & return after ArrayXY --merged array
>>>>    next line X --puts the next line value in X
>>>>    next item Y --if arrayY has fewer elements than arrayX, then 
>>>> empty is supplied, could also put "End of String" in the result
>>>> end repeat
>>>
>>> In SQL:
>>>
>>> SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY 
>>> lineNumber, itemNumber;
>>>
>>> Give your DB enough RAM to work with and this will fly. The fields 
>>> will be indexed once- that's what you are basically saying by 
>>> "access" ... "index". Then they'll be retrieved super fast and the 
>>> sort should be trivial if the records are already in order in your 
>>> database. You could work with a dozen tables at once if you wanted.
>>>
>>> If your still committed to a scripted solution, could you post some 
>>> of your actual code- even if the calculations are hidden? Are you 
>>> using nested repeat for each? This group (myself included) has been 
>>> known to tweak these sorts of things pretty well in the past when 
>>> there is real code to work with...
>>>
>>>
>>> - Brian
>>>
>>>
>>>> Thanks Brian,
>>>>
>>>> I don't require random access to the data.  I only need sequential 
>>>> access.  That is why the repeat for each operator works so fast 
>>>> --less than a microsecond per data item.  I'm not going to match 
>>>> that with anything other than RAM.
>>>>
>>>> Dennis
>>>>
>>>> On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:
>>>>
>>>>> Dennis,
>>>>>
>>>>> I have to agree with Pierre here. If you are looking for random 
>>>>> access to many thousands of records taking up gigabytes of memory, 
>>>>> a database engine is, IMO, the only logical choice.
>>>>>
>>>>> A simple MySQL/PostgreSQL/Valentina/etc database indexed by line 
>>>>> number (or stock symbol) would be very fast.
>>>>>
>>>>> Without indexing your data or fitting all of it into random-access 
>>>>> in-memory data structures, you're fighting a painful battle. If 
>>>>> you algorithm is scaling out linearly, you'll just run too slow, 
>>>>> and if your data size is doing the same you'll run out of memory. 
>>>>> On the other hand, database engines can potentially handle 
>>>>> _terabytes_ of data and give you random access in milliseconds. 
>>>>> You simply won't beat that in Transcript.
>>>>>
>>>>> One thing you could consider if you don't want a whole database 
>>>>> engine to deal with, is the feasibility of indexing the data 
>>>>> yourself - which will give you some of the algorithmic benefits of 
>>>>> a database engine. That is, make one pass where you store the 
>>>>> offsets of each line in an index, and then use that to grab lines. 
>>>>> Something like (untested):
>>>>>
>>>>> ## index the line starts and ends
>>>>> put 1 into lineNumber
>>>>> put 1 into charNum
>>>>> put 1 into lineStarts[1]
>>>>> repeat for each char c in tData
>>>>>     if (c = return) then
>>>>>        put (charNum - 1) into lineEnds[lineNumber]
>>>>>        put (charNum + 1) into lineStarts[lineNumber + 1]
>>>>>        add 1 to lineNumber
>>>>>     end if
>>>>>     add 1 to charNum
>>>>> end repeat
>>>>> if (c <> return) then put charNum into lineEnds[lineNumber]
>>>>>
>>>>> ## get line x via random char access
>>>>> put char lineStarts[x] to lineEnds[x] of tData into lineX
>>>>>
>>>>> - Brian
>>>>>
>>>>>> Thanks Pierre,
>>>>>>
>>>>>> I considered that also.  A Database application would certainly 
>>>>>> handle the amount of data, but they are really meant for finding 
>>>>>> and sorting various fields, not for doing the kind of processing 
>>>>>> I am doing.  The disk accessing would slow down the process.
>>>>>>
>>>>>> Dennis
>>>>>>
>>>>>> On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:
>>>>>>
>>>>
>>>> _______________________________________________
>>>> use-revolution mailing list
>>>> use-revolution at lists.runrev.com
>>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>>
>>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list