On Performance of Array Access
Richard Gaskin
ambassador at fourthworld.com
Fri Aug 31 12:45:48 EDT 2018
Impressive. Thank you.
--
Richard Gaskin
Fourth World Systems
Mark Waddingham wrote:
> Generally, I don't tend to like to 'jump the gun' on anything related to
> optimization lest it is not what it seems when running in the real world
> but...
>
> I'm becoming increasingly confident that a recent foray of mine into yet
> another attempt to improve the speed of array access in LC9 might have
> actually born some quite tasty fruit. Pleasingly the code changes
> required are small, highly localized and quite possibly more than
> suitable for including in a 9.0.x maintenance release.
>
> The current WIP PR can be found here:
>
> https://github.com/livecode/livecode/pull/6671
>
> Amusingly this didn't even start as an attempt at optimization, but an
> attempt to see if I could do anything about heap fragmentation!
>
> The patch itself does three things:
>
> 1) Minimizes the in-memory size of the core types - all 'book-keeping'
> records are now <=32 bytes on 64-bit systems
>
> 2) Makes use of tArray[x]...[z] type expressions much more efficient
> when both evaluating and mutating
>
> 3) Makes use of integer indicies in arrays much more efficient
>
> I've been running it on a number of benchmarks, the two most notable
> follow.
>
> LARGE ARRAY CONSTRUCTION
>
> The following simple code example was supplied in Bug 17434 by Richard
> as it appears to show a memory leak (its actually heap fragmentation,
> but that's another story):
>
> put 100 into tMax
> repeat with i = 1 to tMax
> repeat with j = 1 to tMax
> repeat with k = 1 to tMax
> put any item of "bob,carol,ted,alice" into sA[i][j][k]
> end repeat
> end repeat
> end repeat
>
> The code constructs an 100x100x100 matrix (rendered as a nested array,
> rather than a flat one).
>
> Timings in various versions (on my 2018 MBP laptop) are:
>
> 6.7.11: 1117ms
> 9.0.1: 4020ms
> PR6671: 1017ms
>
> This would seem to be a not too uncommon a pattern of code - judging by
> JBV's recent post on here (which was essentially doing the same thing).
>
> We can modify the above slightly to make it more focussed on array
> performance (i.e. taking out the need to construct / copy a random
> portion of a string):
>
> put 100 into tMax
> repeat with i = 1 to tMax
> repeat with j = 1 to tMax
> repeat with k = 1 to tMax
> put "bob,carol,ted,alice" into sA[i][j][k]
> end repeat
> end repeat
> end repeat
>
> Timings for this are:
>
> 6.7.11: 1055ms
> 9.0.1: 3281ms
> PR6671: 497ms
>
> ARRAY RECORD FILTERING
>
> The following code example was derived by me for my recent LCG talk
> based on a real-world code supplied by Mark Talluto. It iterates over a
> sequence of arrays, creating a new one where each record only has a
> subset of the keys. The keys to keep are taken from a stringlist:
>
> private function _BenchmarkArrayFilterRecords pRecords, pColumnNames
> local tFilteredRecords, tFilteredRecord
> repeat for each element tRecord in pRecords
> repeat for each item tColumnName in pColumnNames
> put tRecord[tColumnName] into tFilteredRecord[tColumnName]
> end repeat
> put tFilteredRecord into \
> tFilteredRecords[the number of elements in tFilteredRecords
> + 1]
> end repeat
> return tFilteredRecords
> end _BenchmarkArrayFilterRecords
>
> Timings for 100000 iterations of the above are:
>
> 6.7.11: 16872ms
> 9.0.1: 8305ms
> PR6671: 4315ms
>
> The following is the same as the above, except the keys to keep are
> taken from the keys of an array:
>
> private function _BenchmarkArrayFilterRecordsSet pRecords,
> pColumnNameSet
> local tFilteredRecords, tFilteredRecord
> repeat for each element tRecord in pRecords
> repeat for each key tColumnName in pColumnNameSet
> put tRecord[tColumnName] into tFilteredRecord[tColumnName]
> end repeat
> put tFilteredRecord into \
> tFilteredRecords[the number of elements in tFilteredRecords
> + 1]
> end repeat
> return tFilteredRecords
> end _BenchmarkArrayFilterRecordsSet
>
> Timings for 100000 iterations of the above are:
>
> 6.7.11: 16508ms
> 9.0.1: 6397ms
> PR6671: 3001ms
>
> REAL WORLD CASE
>
> Now, I'm always a little skeptical about using synthetic benchmarks for
> performance. However, both of the above are actually real-world
> examples. Furthermore, when running a rather large LCS project on an
> engine with PR6671, I got a 2x speed up - one particular input took
> 3mins to process, rather than 6mins (one phase of processing actually
> saw a 5x speed up!).
>
> Anyway, thought this might make some potentially pleasing Friday
> afternoon news :)
>
> Warmest Regards,
>
> Mark.
More information about the use-livecode
mailing list