Quickest was to compare 2 CR lists?

Richard Gaskin ambassador at fourthworld.com
Tue Nov 4 13:18:38 EST 2008


Jim Ault wrote:

> Note to Richard Gaskin, the benchmark master:  Is there any benefit to using
> 
> sort list1 numeric ascending by length(each)
> sort list2 numeric ascending by length(each)
> --then doing 
> repeat for each
>    ...
> end repeat
> --such that the lists are skewed so that the shortest comparisons will occur
> closest to the beginning of the variable, as opposed to the extreme of the
> shortest match happening at the end of the variable list?  Obviously the
> same number of match operations must occur, but the sorted lists might yield
> more time savings than the sorting operations consume.  Perhaps the reverse
> is true (sort descending).
> 
> My guess is that short lists of short lines will make no difference,  such
> as lists of 2000 lines would be considered short.

Hard to say, though I agree it's useful to note that a given algorithm 
may perform better in some cases and worse in others depending on how 
well it scales.

In general, custom sort functions are computationally intensive, 
measuring only slightly better than less concise syntax to accomplish a 
similar task.  When you ask the engine to work hard, it's going to take 
a lot of work, even if the syntax to trigger it seems lightweight. :)

As for how that all plays out as you describe, I don't know offhand and 
would have to test it out.  Not likely this week, as I have a few 
deadlines to meet.

Unrelated to your specific question but relevant to this thread is a 
reminder that the split command is also computationally intensive, 
despite its seductively trim syntax.  Under the hood it has to walk 
through the whole text, keeping track of delimiters along the way, 
parsing it up and putting those parts into array slots, so it's really 
not much different from "repeat for each". If the data already lives in 
an array that's not a problem of course since you won't be using the 
split command, but if you need to transform data from lists to arrays 
and back again you may notice a performance loss over just keeping it in 
a list and using "repeat for each".

So while I don't have a specific answer to your question and in that 
regard am close to useless here <g>, perhaps the only thing I could 
contribute would be to stress that there is sometimes an inverse 
correlation between the simplicity of syntax and the complexity of the 
algorithm that syntax will invoke.

How that plays out in practice depends on many things, but here are two 
general guidelines I've adopted from my testing:

When a task requires accessing specific elements from a list, using an 
array will be about an order of magnitude faster than using "line x 
of.." or "lineoffset".

However, when you need to traverse an entire data set, "repeat for each 
line" performs surprisingly well, testing here about 20% faster than 
"repeat for each element".

-- 
  Richard Gaskin
  Managing Editor, revJournal
  _______________________________________________________
  Rev tips, tutorials and more: http://www.revJournal.com



More information about the use-livecode mailing list