Making Revolution faster with really big arrays
Brian Yennie
briany at qldlearning.com
Wed Apr 13 20:32:41 EDT 2005
Dennis,
I through together a real rough test- just some tables with random
values 0-1000.
SELECT (v10000 + v2500) FROM values10000,values2500 LIMIT 500000;
This ran for me in about 10-15 seconds off of a G4 XServe with 1GB
memory. Note that I only had enough free memory to do chunks of 500,000
so it looks like around a minute for all 2,500,000 combinations.
This with all of the database on-disk and no optimizations /
tweaking... so it should be an upper limit.
How is the most recent scripted solution timing out for you?
- Brian
> Brian,
>
> Can you create a 10000 by 2500 by 100 array of random numbers
> (sparsely populated i.e. some values are null) in SQL?
>
> Can you put the sum of two random accessed numbers (except if one is
> null then the sum is null) into a third random location?
>
> How fast can you do this last thing 25000000 times in SQL.
> This is the simplest of the things I am doing.
> If you can do that in SQL in less than a minute, you've got my
> attention :-)
>
> Dennis
>
> On Apr 13, 2005, at 12:36 PM, Brian Yennie wrote:
>
>> Dennis,
>>
>> I guess it's half of each, as I see it (and I misread some of this I
>> think).
>> You only need sequential access to the lines and items, but random
>> access would solve your problems. Random access is even faster than
>> sequential, and can do everything sequential can.
>>
>> From your syntax example:
>>
>>> access each line X in arrayX--initial setup of pointers and X value
>>> access each item Y in arrayY --initial setup of pointers and Y value
>>> repeat for number of lines of arrayX times --same as a repeat for
>>> each
>>> put X & comma & Y & return after ArrayXY --merged array
>>> next line X --puts the next line value in X
>>> next item Y --if arrayY has fewer elements than arrayX, then
>>> empty is supplied, could also put "End of String" in the result
>>> end repeat
>>
>> In SQL:
>>
>> SELECT CONCAT(lineValue, ',' , itemValue) FROM lines, items ORDER BY
>> lineNumber, itemNumber;
>>
>> Give your DB enough RAM to work with and this will fly. The fields
>> will be indexed once- that's what you are basically saying by
>> "access" ... "index". Then they'll be retrieved super fast and the
>> sort should be trivial if the records are already in order in your
>> database. You could work with a dozen tables at once if you wanted.
>>
>> If your still committed to a scripted solution, could you post some
>> of your actual code- even if the calculations are hidden? Are you
>> using nested repeat for each? This group (myself included) has been
>> known to tweak these sorts of things pretty well in the past when
>> there is real code to work with...
>>
>>
>> - Brian
>>
>>
>>> Thanks Brian,
>>>
>>> I don't require random access to the data. I only need sequential
>>> access. That is why the repeat for each operator works so fast
>>> --less than a microsecond per data item. I'm not going to match
>>> that with anything other than RAM.
>>>
>>> Dennis
>>>
>>> On Apr 12, 2005, at 10:06 PM, Brian Yennie wrote:
>>>
>>>> Dennis,
>>>>
>>>> I have to agree with Pierre here. If you are looking for random
>>>> access to many thousands of records taking up gigabytes of memory,
>>>> a database engine is, IMO, the only logical choice.
>>>>
>>>> A simple MySQL/PostgreSQL/Valentina/etc database indexed by line
>>>> number (or stock symbol) would be very fast.
>>>>
>>>> Without indexing your data or fitting all of it into random-access
>>>> in-memory data structures, you're fighting a painful battle. If you
>>>> algorithm is scaling out linearly, you'll just run too slow, and if
>>>> your data size is doing the same you'll run out of memory. On the
>>>> other hand, database engines can potentially handle _terabytes_ of
>>>> data and give you random access in milliseconds. You simply won't
>>>> beat that in Transcript.
>>>>
>>>> One thing you could consider if you don't want a whole database
>>>> engine to deal with, is the feasibility of indexing the data
>>>> yourself - which will give you some of the algorithmic benefits of
>>>> a database engine. That is, make one pass where you store the
>>>> offsets of each line in an index, and then use that to grab lines.
>>>> Something like (untested):
>>>>
>>>> ## index the line starts and ends
>>>> put 1 into lineNumber
>>>> put 1 into charNum
>>>> put 1 into lineStarts[1]
>>>> repeat for each char c in tData
>>>> if (c = return) then
>>>> put (charNum - 1) into lineEnds[lineNumber]
>>>> put (charNum + 1) into lineStarts[lineNumber + 1]
>>>> add 1 to lineNumber
>>>> end if
>>>> add 1 to charNum
>>>> end repeat
>>>> if (c <> return) then put charNum into lineEnds[lineNumber]
>>>>
>>>> ## get line x via random char access
>>>> put char lineStarts[x] to lineEnds[x] of tData into lineX
>>>>
>>>> - Brian
>>>>
>>>>> Thanks Pierre,
>>>>>
>>>>> I considered that also. A Database application would certainly
>>>>> handle the amount of data, but they are really meant for finding
>>>>> and sorting various fields, not for doing the kind of processing I
>>>>> am doing. The disk accessing would slow down the process.
>>>>>
>>>>> Dennis
>>>>>
>>>>> On Apr 12, 2005, at 5:27 PM, Pierre Sahores wrote:
>>>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> use-revolution at lists.runrev.com
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>
More information about the use-livecode
mailing list