Optimization can be tricky
Ali Lloyd
ali.lloyd at livecode.com
Tue Jun 12 04:08:53 EDT 2018
Hi Geoff,
One thing to try in your original code, which should be significantly
faster if the array is big, is using
> repeat for each key T in interestArray[uID]
instead of
> repeat for each line T in the keys of interestArray[uID]
The latter has to allocate memory for a string containing all the keys, AND
iterate through the lines of that string, whereas repeating for each
key directly
accesses the keys of the array in hash order, and thus no extra allocation
will occur and the iteration is faster than finding the next line delimiter
each time.
On Tue, Jun 12, 2018 at 8:35 AM Clarence Martin via use-livecode <
use-livecode at lists.runrev.com> wrote:
> I know that this might be a different use-case but I have a CA Lottery -
> Fantasy five lottery parser that collects lottery data and loads it into a
> matrix and provides the total number of hits for each number. This also
> has 4 optimization buttons to the show differences. This aupplication was
> optimized by our Pasadena LiveCode user's group. Drop me an email and I
> will send you a zipped copy of the Script with a sample lotto data file
> with approximately 8000 lines of picks.
> Email: chipsm at themartinz.com
>
> -----Original Message-----
> From: use-livecode <use-livecode-bounces at lists.runrev.com> On Behalf Of
> Geoff Canyon via use-livecode
> Sent: Monday, June 11, 2018 5:22 PM
> To: How to use LiveCode <use-livecode at lists.runrev.com>
> Cc: Geoff Canyon <gcanyon at gmail.com>
> Subject: Optimization can be tricky
>
> I have a routine that takes about a minute to run for the test case I
> created, and who knows how long for the real use case. Given that I want to
> run it several thousand times for actual work, I need it to run (much)
> faster.
>
> Roughly, the routine gathers together a bunch of data from arrays, sorts
> the data based on its relationship to other arrays, and then returns a
> subset of the result.
>
> My first pass at the routine looked roughly like this:
>
> repeat for each line T in the keys of interestArray[uID]
> repeat for each line S in storyArray[T]
> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
> and userSeenArray[uID][item 1 of S] < 4
> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
> abs(item 2 of S - item 1 of interestArray[uID][T]) - \
> item 2 of interestArray[uID][T]),T,S & cr after
> candidateList
> end repeat
> end repeat
> sort lines of candidateList numeric by random(item 1 of each)
>
> In simple terms: parse through the individual lines of all the entries
> that possibly work, calculating a relevant value for each and appending
> that value and the line to an interim list, which then gets sorted,
> randomly favoring lower values.
>
> I assumed the problem was all the line-by-line parsing, and I thought of a
> clever way to accomplish the same thing. That led to this:
>
> put storyArray into R
> intersect R with interestArray[uID]
> combine R using cr and comma
> sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 *
> \
> abs(item 3 of each - item 1 of interestArray[uID][item 1 of
> each]) \
> - item 2 of interestArray[uID][item 1 of each])
>
> Much simpler, albeit that's a heck of a "sort by" -- more complex by far
> than any I had previously created, and a testament to the power and
> flexibility of "each". Alas, despite condensing my code and removing
> parsing and loops, that version took ten seconds more than the previous
> version, I'm guessing because the previous version has that "if" statement
> that weeds out undesirable entries before the sort has to deal with them.
>
> (I'm writing this email as I parse through this figuring it out)
>
> So it turns out that the crazy use of "each" is only part of the problem
> -- changing that line to:
>
> sort lines of R by random(10000)
>
> still takes over 20 seconds -- 3x as fast, but still far too slow. It
> turns out that the source data numbers anywhere from 1,000 to 2,000 lines,
> so sorting it in any way to randomly select the several lines I need is a
> really bad choice. removing the sort step but keeping everything else cuts
> the execution time down to under a second.
>
> Hmm. How to select several lines at weighted random from among a couple
> thousand? I'll think on this and post a follow-up.
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list