There should be a "unique" option on sort . . .
Richard Gaskin
ambassador at fourthworld.com
Tue Jan 7 09:29:37 EST 2014
Geoff, I think you may have stumbled across the greatest challenge of
benchmarking: the subtle differences over which we have little or no
control.
Many factors can affect performance, including differences in chip
architectures with regard to instruction set features, caching, etc.,
system memory caching, background processes, and much more. There have
even been times when I've seen changing the order in which testing
functions are called can affect outcomes. And file I/O tests often
benefit from a cold boot due to system caching - sometimes worked around
running purge, but still tedious at best.
When we see stark differences like the ones between your and my results
running the same code in the same version of LiveCode (I'm using 6.5.1),
I think we've moved outside of LiveCode and are now at the mercy of
hardware and software nuances beyond our control.
If nothing else it reminds us of the value of testing comparative
benchmarks across multiple machines.
But one fairly constant thing in all this is that when we can hand off
processing to the compiled C++ code in the engine, in many cases (Regex
and others notwithstanding) we can expect a boost in performance.
I would be interested to see the results if you swap out the original
array function with this one, from
<http://lists.runrev.com/pipermail/use-livecode/2014-January/197038.html>:
function UniqueListFromArray3 pData
set the caseSensitive to true
split pData using cr and cr
put the keys of pData into tKeys
sort lines of tKeys
return tKeys
end UniqueListFromArray3
Under the hood, the split command presumably does a series of steps very
similar to what we'd have to do in script to build an array, but without
the overhead of type coersion and other factors that characterize
dynamic compilation, and using an algo written very specifically for
that task. That is, it still needs to hash every key and move the
contents into the appropriate bucket, but with far less overhead than
running those steps through the interpreter.
The only downside to using split is that it differs from most other
array-building methods in that it's always case-sensitive (see the
enhancement request to allow split to use the caseSensitive property
here: <http://quality.runrev.com/show_bug.cgi?id=11651>).
So to pass the sanity check, you'll need to add this line to the other
function as well:
set the caseSensitive to true
I'll wager the beverage of your choice at RevLive in San Diego this
September that the split method will be faster even on your system.
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/FourthWorldSys
Geoff Canyon wrote:
> I *think* I tested up to 100,000 lines -- spent the day traveling back from
> Boston to St. Louis, a little groggy -- *but* my keys were always integers.
>
> On Mon, Jan 6, 2014 at 10:13 AM, Richard Gaskin
> <ambassador at fourthworld.com>wrote:
>
>> function UniqueListFromChunks pData
>> sort lines of pData
>> # put line 1 of pData is false into tLastLine -- what does this do?
>> put empty into tLastLine
>> repeat for each line tLine in pData
>> if tLine is tLastLine then next repeat
>>
>
> NOTE: I corrected the variable names in the above.
>
> Ha ;-) on any day other than yesterday would have done what you did, or
> skipped the initialization altogether. BUT -- if the first line of data in
> pData is empty (your version) or tLastLine (mine), the above will
> (incorrectly) omit it. So the line I gave, "put line 1 of pData is false
> into lastLine" guarantees that the first line will be included, without
> having to use a conditional that only matters the first iteration inside
> the repeat.
>
> Huh, that's weird. I copy/pasted your code into a button and ran it:
>
> 10 iterations on 100 lines of 5 or fewer chars:
>
> Array: 2 ms (0.2 ms per iteration)
>
> Chunks: 1 ms (0.1 ms per iteration)
>
> Results match - Each list has 95 lines
>
>
> I ran that several times, and the winner flip-flopped several times. So I
> switched to the long seconds. With that, the "Chunks" version is almost
> always the winner (if only by a few ten-thousandths of a second) Typical
> result:
>
>
> 10 iterations on 100 lines of 5 or fewer chars:
>
> Array: 0.001466 seconds (0.000147 seconds per iteration)
>
> Chunks: 0.001218 seconds (0.000122 seconds per iteration)
>
> Results match - Each list has 97 lines
>
>
> Curiouser and curiouser:
>
>
> 10 iterations on 100 lines of 250 or fewer chars:
>
> Array: 0.002393 seconds (0.000239 seconds per iteration)
>
> Chunks: 0.001738 seconds (0.000174 seconds per iteration)
>
> Results match - Each list has 97 lines
>
>
>
> With 1000 lines it favors the array more often, but I still saw outcomes
> where chunks won (not this result, obviously -- trying to be
> representative):
>
>
> 10 iterations on 1000 lines of 5 or fewer chars:
>
> Array: 0.007609 seconds (0.000761 seconds per iteration)
>
> Chunks: 0.007894 seconds (0.000789 seconds per iteration)
>
> Results match - Each list has 617 lines
>
>
> And then back to chunks (mostly):
>
>
> 10 iterations on 1000 lines of 250 or fewer chars:
>
> Array: 0.015478 seconds (0.001548 seconds per iteration)
>
> Chunks: 0.015227 seconds (0.001523 seconds per iteration)
>
> Results match - Each list has 740 lines
>
>
> We start converging at 10,000 and beyond:
>
>
> 10 iterations on 10000 lines of 5 or fewer chars:
>
> Array: 0.029378 seconds (0.002938 seconds per iteration)
>
> Chunks: 0.06806 seconds (0.006806 seconds per iteration)
>
> Results match - Each list has 988 lines
>
>
> 10 iterations on 10000 lines of 250 or fewer chars:
>
> Array: 0.071169 seconds (0.007117 seconds per iteration)
>
> Chunks: 0.148104 seconds (0.01481 seconds per iteration)
>
> Results match - Each list has 1492 lines
>
>
> 10 iterations on 100000 lines of 5 or fewer chars:
>
> Array: 0.229239 seconds (0.022924 seconds per iteration)
>
> Chunks: 0.732289 seconds (0.073229 seconds per iteration)
>
> Results match - Each list has 985 lines
>
>
> 10 iterations on 100000 lines of 250 or fewer chars:
>
> Array: 0.604814 seconds (0.060481 seconds per iteration)
>
> Chunks: 2.04249 seconds (0.204249 seconds per iteration)
>
> Results match - Each list has 1494 lines
More information about the use-livecode
mailing list