Making Revolution faster with really big arrays

Alex Tweedly alex at tweedly.net
Tue Apr 12 21:03:45 EDT 2005


Dennis Brown wrote:

> Hi all,
>
> I just joined this list.  What a great resource for sharing ideas and 
> getting help.
>
>      So I came up with an idea for a proposed language extension.  I 
> put the idea in Bugzilla yesterday, then today, I thought I should ask 
> others if they liked the idea, had a better idea, or could help me 
> work around not having this feature in the mean time, since I doubt I 
> would see it implemented in my lifetime based on the speed I see 
> things getting addressed in the Bugzilla list.
>
> The Idea is to break apart the essential functional elements of the 
> repeat for each control to allow more flexibility.  This sample has a 
> bit more refinement than what I posted yesterday in Bugzilla.
>
> The new keyword would be "access" , but could be something else.
>
> An example of the use of the new keywords syntax would be:
>
> access each line X in arrayX--initial setup of pointers and X value
> access each item Y in arrayY --initial setup of pointers and Y value
> repeat for number of lines of arrayX times --same as a repeat for each
>    put X & comma & Y & return after ArrayXY --merged array
>    next line X --puts the next line value in X
>    next item Y --if arrayY has fewer elements than arrayX, then empty 
> is supplied, could also put "End of String" in the result
> end repeat
>
> Another advantage of this syntax is that it provides for more 
> flexibility in structure of loops.  You could repeat forever, then 
> exit repeat when you run out of values (based on getting an empty 
> back).  The possibilities for high speed sequential access data 
> processing are much expanded which opens up more possibilities for 
> Revolution.
>
> I would love to get your feedback or other ideas about solving this 
> problem.

Dennis, I think you have the start of a very powerful suggestion here. I 
will get back to you with some feedback on the abstract idea itself (may 
take me a day or two ...)

In the meantime, here is a possible workaround for you.

I wrote most of this email, and decided it looked so outlandish you 
wouldn't believe it (and no-one else would either). So I had to tidy up 
the code and verify the timings.  First the timing result - then the 
suggestion

Basic test case: fill two variables each with N random numbers (one per 
line), produce as a result the summed list
   (though the actual operation could be anything).

            N    1000  2000  5000 10,000 20,000 50,000
Simple method       9    37   188    696   3343  17831
Suggestion         14    31    78    159    320    814

As you can see, for small sizes, the overhead is makes this take 
(marginally) longer, but for large cases (where it matters) the better 
scaling makes it much better.

Simple method :
look through one list, using the corresponding "line y" from the other

  put empty into tResult
  put the millisecs into t1
  put 1 into y
  repeat for each line XL in Xlist
    put XL + line y of Ylist & cr after tResult
    add 1 to y
  end repeat
  put the number of lines in tResult && line 1 of tResult && the last 
line of tResult && the millisecs-t1 & cr after field "Field"

Suggestion:
create a combined list, prefixing each line from list 1 with 1,3,5,7,... 
and each line from list 2 with 2,4,6,8,...
sort (numeric) this combined list
process the combined list, alternately storing and adding
    (note this generalizes to quite easily to handle more than two 
lists, and extracting the data at each turn)

  put empty into tResult
  put empty into tInter
  put the millisecs into t1
  put 1 into y
  repeat for each line XL in Xlist
    put y, XL & cr after tInter
    add 2 to y
  end repeat
  put 2 into y
  repeat for each line YL in Ylist
    put y, YL & cr after tInter
    add 2 to y
  end repeat
  sort tInter numeric
  put 1 into y
  repeat for each line L in tInter
    switch y
    case 1
      put item 2 of L into t
      put 2 into y
      break
    case 2
      put (item 2 of L)+t & cr after tResult
      put 1 into y
    end switch
  end repeat
  put the number of lines in tResult && line 1 of tResult && the last 
line of tResult && the millisecs-t1 & cr after field "Field"

caveat: this suggestion manipulates the data more, so it might not be so 
effective if the lines in question were longer than this test case - but 
a quick test  (put random(C) && "xxxxxxxxxxxxxxxxxxxxxxx" into each 
line) suggests it won't be too bad.

-- 
Alex Tweedly       http://www.tweedly.net



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.6 - Release Date: 11/04/2005



More information about the use-livecode mailing list