How to use an array to solve the following...
Richard Gaskin
ambassador at fourthworld.com
Tue Feb 21 11:48:10 EST 2012
Pete wrote:
> Interesting, and it kinda makes sense. For elements, there's no
> positioning required like with lines/words/item, just a case of cycling
> through the keys - which is what "repeat for each line <x> in the keys of
> <array> does I suppose.
As with most things in computing, the truly optimal solution comes with
a lot of "depends"; total date size, size of elements, distance from the
start of a chunk to the value being obtained in it, how deeply nested
are the array keys - all those and more play a role in total
performance, which can sometimes yield unexpected results.
One challenge with arrays is their use in CGIs, where total throughput
performance is unusually critical since the app is born, lives, and dies
all in the space of satisfying a single request from the user.
The problem with arrays in that context is that they don't exist with
the routine begins, since the engine itself needs to be loaded.
Arrays offer blinding speed for random access, but they're able to do
this because they rely on memory-specific structures, leaving us with
the question: how do we load the array from a cold start?
One can use custom properties, or arrayEncode/arrayDecode, or
split/combine, but all of them are only slightly optimized versions of
what you'd need to do if you had to script it yourself using "repeat for
each line..." and stuffing the array elements sequentially.
So oddly enough, if the context of use requires that you take into
account the loading of the array, total throughput will often be
substantially slower than scooping up a delimited file and using chunk
expressions on it.
Even outside of a total-throughput context, I've seen other cases where
arrays can be slower than "repeat for each", such as deeply-nested
arrays (say, four levels deep). In such cases, while each traversal of
the hash used to identify the location of the element value is pretty
darn fast, you'll have to do four traversals of each hash to get at each
element, and that can add up.
Moreover, arrays can impact memory in ways that chunks don't, because in
a world where we don't yet have structs (see
<http://quality.runrev.com/show_bug.cgi?id=8304>), element labels are
replicated for every key. With a tab-delimited list the non-data
overhead is one char per field, but with arrays it's the length of the
key for every field, which can double the size of the data in memory if
the keys are as long as the data.
So alas, as you folks have done here, many times the only way to know
for sure what an optimal solution will be is to test it.
If you find yourself doing this sort of thing often, I've put together a
few tips on benchmarking performance in this LiveCode Journal article:
<http://livecodejournal.com/tutorials/benchmarking-revtalk.html>
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
LiveCode Journal blog: http://LiveCodejournal.com/blog.irv
More information about the use-livecode
mailing list