Repeat For Each Iteration Order (was Re: use-livecode Digest, Vol 169, Issue 18)

Lagi Pittas iphonelagi at gmail.com
Fri Oct 13 06:31:34 EDT 2017


Hi All

I feel my IQ goes up by at lest 10 points when I read some of Mark's
detailed answers concerning Livecode internals - and the thought processes
that go into them.

Mark's and many others' answers are worth the price of admission to this
list - thanks a lot Mark for such an erudite exposition of this and other
arcane Livecode concepts.

I have "Joel on Software"  but a" Mark on software" would be a great read
as well.

Lagi

On 12 October 2017 at 18:16, Mark Waddingham via use-livecode <
use-livecode at lists.runrev.com> wrote:

> On 2017-10-12 18:38, Peter Reid via use-livecode wrote:
>
>> I agree that the redundant indexing is an expensive approach, however
>> I have found that this abnormal structure of a repeat for each loop
>> can be a lot faster than the other loop forms in some circumstances.
>>
>> I can't find the reference that first highlighted to me the lack of
>> guaranteed sequence of chunks, but I've assumed this was a restriction
>> for some time. It would be great if this is incorrect as I've got some
>> heavyweight looping in several of my apps that would benefit from
>> this!
>>
>> Can anyone from LiveCode give a definitive answer to this please?
>>
>
> Yes - the previous posts are correct.
>
> If you are using 'repeat for each' to iterate over something which has an
> order, then the iteration will follow that order.
>
> Specifically:
>
>    - string chunks (char, item, line, word, trueWord, sentence, paragraph,
> token, codepoint, codeunit) are all ordered
>
>    - byte chunks are ordered
>
>    - key chunks are unordered (they use the hash-order of the array, which
> can change arbitrarily even within a session)
>
>    - element chunks are ordered for arrays which are sequences*, unordered
> for all other arrays
>
> * An array is a sequence if: it has only integer keys, and keys range from
> 1 up to the number of elements in the array.
>   e.g. If the keys are 1, 2, 3, 4 - then that is a sequence; 0, 1, 2, 3 is
> not; 1, 3, 4 is not.
>
> Just to reiterate something Richard mentioned - the performance advantage
> of a 'repeat for each' loop is completely lost if you use a normal chunk
> within the loop to fetch another part (or the same part!) of the container:
>
>   put 0 into tIndex
>   repeat for each word tWord in tString
>     add 1 to tIndex
>     put word somefunc(tIndex) of tString into tSameWord
>     ...
>   end repeat
>
> This is because the chunk evaluation must step through somefunc(tIndex)
> words in the string *again*. (It takes N steps to access the N'th
> string/byte chunk in a container - repeat for each gets its performance
> advantage because it remembers where it got the previous chunk from so it
> can fetch the next, so it is just 1 step each iteration).
>
> The only types of access which are guaranteed not to have this N step
> behavior are codeunit, byte and array accesses:
>
>   - a LiveCode string is (internally) an array of codeunits; which means
> that it takes only one step to get any codeunit in it.
>
>   - a LiveCode binary string is (internally) an array of bytes; so it only
> takes one step to get any byte from it
>
>   - LiveCode arrays are hash-tables, so you can look up any key in one step
>
> Note: If you are using the 'byte' chunk, make sure that the thing they are
> accessing *is* actually a binary string before using it on them. The
> conversion from string -> binary string (which exists for compatibility
> with pre-7 scripts) will cause a performance bump.
>
> In terms of the codeunit/codepoint/character chunks - the engine *tries*
> to ensure it does as little work as possible when accessing these.
> Internally, the engine sets flags on a string so that it can optimize these
> lookups. In particular:
>
>   1) If a string contains only Unicode codepoints from the first 64k
> Unicode codes *without* surrogates (they give access to everything above
> 64k) then codepoint access on that string will be 1 step.
>
>   2) If a string contains characters which are only composed of single
> codepoints in the first 64k Unicode code *without* surrogates, then char
> access on that string will be 1 step.
>
> In particular, if the string you are processing actually came from a
> native string; then (1) and (2) are guaranteed to be the case. However, if
> you have arbitrary Unicode strings, then it won't generally be the case:
>
>   a) any characters which have a Unicode code > 64k must be represented by
> two codes < 64k (surrogate pairs); this means that the engine has to step
> through the string codeunit by codeunit to count the codepoints so it can
> find the one you asked for (N step again)
>
>   b) any characters which are composed of multiple codepoints are in the
> same boat
>
> Case (b) can arise quite often even in Roman script languages. For
> example, in Unicode you can write e-acute as EITHER e,combining-acute (two
> codepoints) OR as e-with-acute (one codepoint). For some languages, most
> characters will be multiple codepoints - e.g. the Indic languages. There
> are also a lot of esoteric languages (for scholarly / academic purposes)
> which are entirely defined by Unicode codepoints > 64k.
>
> Upshot is that if you are doing char by char string processing of general
> Unicode strings, then try and use repeat for each char; and if you *can't*
> do that for some reason, then try and work out how you can do the
> processing by using the codeunit chunk.
>
> However, as Knuth is famed for saying - "Premature optimization is the
> root of all evil" - don't spend time trying to optimize an algorithm until
> you have a working version and can intimately analyse where the slow parts
> are. Trying to optimize ahead of time will likely just cause a lot of time
> being spent needlessly, and potential hair loss when figuring out why the
> (probably much more complicated code) doesn't work quite right, or why it
> didn't produce the speed improvement you were expecting!
>
> Warmest Regards,
>
> Mark.
>
> --
> Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list