How to use an array to solve the following...

Richard Gaskin ambassador at fourthworld.com
Tue Feb 21 11:48:10 EST 2012


Pete wrote:
> Interesting, and it kinda makes sense.  For elements, there's no
> positioning required like with lines/words/item, just a case of cycling
> through the keys - which is what "repeat for each line <x> in the keys of
> <array> does I suppose.

As with most things in computing, the truly optimal solution comes with 
a lot of "depends"; total date size, size of elements, distance from the 
start of a chunk to the value being obtained in it, how deeply nested 
are the array keys - all those and more play a role in total 
performance, which can sometimes yield unexpected results.

One challenge with arrays is their use in CGIs, where total throughput 
performance is unusually critical since the app is born, lives, and dies 
all in the space of satisfying a single request from the user.

The problem with arrays in that context is that they don't exist with 
the routine begins, since the engine itself needs to be loaded.

Arrays offer blinding speed for random access, but they're able to do 
this because they rely on memory-specific structures, leaving us with 
the question:  how do we load the array from a cold start?

One can use custom properties, or arrayEncode/arrayDecode, or 
split/combine, but all of them are only slightly optimized versions of 
what you'd need to do if you had to script it yourself using "repeat for 
each line..." and stuffing the array elements sequentially.

So oddly enough, if the context of use requires that you take into 
account the loading of the array, total throughput will often be 
substantially slower than scooping up a delimited file and using chunk 
expressions on it.

Even outside of a total-throughput context, I've seen other cases where 
arrays can be slower than "repeat for each", such as deeply-nested 
arrays (say, four levels deep).  In such cases, while each traversal of 
the hash used to identify the location of the element value is pretty 
darn fast, you'll have to do four traversals of each hash to get at each 
element, and that can add up.

Moreover, arrays can impact memory in ways that chunks don't, because in 
a world where we don't yet have structs (see 
<http://quality.runrev.com/show_bug.cgi?id=8304>), element labels are 
replicated for every key.  With a tab-delimited list the non-data 
overhead is one char per field, but with arrays it's the length of the 
key for every field, which can double the size of the data in memory if 
the keys are as long as the data.

So alas, as you folks have done here, many times the only way to know 
for sure what an optimal solution will be is to test it.

If you find yourself doing this sort of thing often, I've put together a 
few tips on benchmarking performance in this LiveCode Journal article:

<http://livecodejournal.com/tutorials/benchmarking-revtalk.html>

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  LiveCode Journal blog: http://LiveCodejournal.com/blog.irv




More information about the use-livecode mailing list