Optimization can be tricky

Geoff Canyon gcanyon at gmail.com
Mon Jun 11 20:21:59 EDT 2018


​I have a routine that takes about a minute to run for the test case I
created, and who knows how long for the real use case. Given that I want to
run it several thousand times for actual work, I need it to run (much)
faster.

Roughly, the routine gathers together a bunch of data from arrays, sorts
the data based on its relationship to other arrays, and then returns a
subset of the result.

My first pass at the routine looked roughly like this:

   repeat for each line T in the keys of interestArray[uID]
      repeat for each line S in storyArray[T]
         if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
               and userSeenArray[uID][item 1 of S] < 4
         then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
               abs(item 2 of S - item 1 of interestArray[uID][T]) - \
               item 2 of interestArray[uID][T]),T,S & cr after candidateList
      end repeat
   end repeat
   sort lines of candidateList numeric by random(item 1 of each)

In simple terms: parse through the individual lines of all the entries that
possibly work, calculating a relevant value for each and appending that
value and the line to an interim list, which then gets sorted, randomly
favoring lower values.

I assumed the problem was all the line-by-line parsing, and I thought of a
clever way to accomplish the same thing. That led to this:

   put storyArray into R
   intersect R with interestArray[uID]
   combine R using cr and comma
   sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * \
         abs(item 3 of each - item 1 of interestArray[uID][item 1 of each])
\
         - item 2 of interestArray[uID][item 1 of each])

Much simpler, albeit that's a heck of a "sort by" -- more complex by far
than any I had previously created, and a testament to the power and
flexibility of "each". Alas, despite condensing my code and removing
parsing and loops, that version took ten seconds more than the previous
version, I'm guessing because the previous version has that "if" statement
that weeds out undesirable entries before the sort has to deal with them.

(I'm writing this email as I parse through this figuring it out)

So it turns out that the crazy use of "each" is only part of the problem --
changing that line to:

   sort lines of R by random(10000)

still takes over 20 seconds -- 3x as fast, but still far too slow. It turns
out that the source data numbers anywhere from 1,000 to 2,000 lines, so
sorting it in any way to randomly select the several lines I need is a
really bad choice. removing the sort step but keeping everything else cuts
the execution time down to under a second.

Hmm. How to select several lines at weighted random from among a couple
thousand? I'll think on this and post a follow-up.



More information about the use-livecode mailing list