On Performance of Array Access

Fri Aug 31 12:45:48 EDT 2018

Impressive. Thank you.

-- 
  Richard Gaskin
  Fourth World Systems

Mark Waddingham wrote:
> Generally, I don't tend to like to 'jump the gun' on anything related to 
> optimization lest it is not what it seems when running in the real world 
> but...
> 
> I'm becoming increasingly confident that a recent foray of mine into yet 
> another attempt to improve the speed of array access in LC9 might have 
> actually born some quite tasty fruit. Pleasingly the code changes 
> required are small, highly localized and quite possibly more than 
> suitable for including in a 9.0.x maintenance release.
> 
> The current WIP PR can be found here:
> 
>    https://github.com/livecode/livecode/pull/6671
> 
> Amusingly this didn't even start as an attempt at optimization, but an 
> attempt to see if I could do anything about heap fragmentation!
> 
> The patch itself does three things:
> 
>    1) Minimizes the in-memory size of the core types - all 'book-keeping' 
> records are now <=32 bytes on 64-bit systems
> 
>    2) Makes use of tArray[x]...[z] type expressions much more efficient 
> when both evaluating and mutating
> 
>    3) Makes use of integer indicies in arrays much more efficient
> 
> I've been running it on a number of benchmarks, the two most notable 
> follow.
> 
> LARGE ARRAY CONSTRUCTION
> 
> The following simple code example was supplied in Bug 17434 by Richard 
> as it appears to show a memory leak (its actually heap fragmentation, 
> but that's another story):
> 
>          put 100 into tMax
> 	repeat with i = 1 to tMax
> 		repeat with j = 1 to tMax
> 			repeat with k = 1 to tMax
> 				put any item of "bob,carol,ted,alice" into sA[i][j][k]
> 			end repeat
> 		end repeat
> 	end repeat
> 
> The code constructs an 100x100x100 matrix (rendered as a nested array, 
> rather than a flat one).
> 
> Timings in various versions (on my 2018 MBP laptop) are:
> 
>    6.7.11: 1117ms
>    9.0.1:  4020ms
>    PR6671: 1017ms
> 
> This would seem to be a not too uncommon a pattern of code - judging by 
> JBV's recent post on here (which was essentially doing the same thing).
> 
> We can modify the above slightly to make it more focussed on array 
> performance (i.e. taking out the need to construct / copy a random 
> portion of a string):
> 
>          put 100 into tMax
> 	repeat with i = 1 to tMax
> 		repeat with j = 1 to tMax
> 			repeat with k = 1 to tMax
> 				put "bob,carol,ted,alice" into sA[i][j][k]
> 			end repeat
> 		end repeat
> 	end repeat
> 
> Timings for this are:
> 
>    6.7.11: 1055ms
>    9.0.1:  3281ms
>    PR6671: 497ms
> 
> ARRAY RECORD FILTERING
> 
> The following code example was derived by me for my recent LCG talk 
> based on a real-world code supplied by Mark Talluto. It iterates over a 
> sequence of arrays, creating a new one where each record only has a 
> subset of the keys. The keys to keep are taken from a stringlist:
> 
> private function _BenchmarkArrayFilterRecords pRecords, pColumnNames
>     local tFilteredRecords, tFilteredRecord
>     repeat for each element tRecord in pRecords
>        repeat for each item tColumnName in pColumnNames
>           put tRecord[tColumnName] into tFilteredRecord[tColumnName]
>        end repeat
>        put tFilteredRecord into \
>              tFilteredRecords[the number of elements in tFilteredRecords 
> + 1]
>     end repeat
>     return tFilteredRecords
> end _BenchmarkArrayFilterRecords
> 
> Timings for 100000 iterations of the above are:
> 
>    6.7.11: 16872ms
>    9.0.1:  8305ms
>    PR6671: 4315ms
> 
> The following is the same as the above, except the keys to keep are 
> taken from the keys of an array:
> 
> private function _BenchmarkArrayFilterRecordsSet pRecords, 
> pColumnNameSet
>     local tFilteredRecords, tFilteredRecord
>     repeat for each element tRecord in pRecords
>        repeat for each key tColumnName in pColumnNameSet
>           put tRecord[tColumnName] into tFilteredRecord[tColumnName]
>        end repeat
>        put tFilteredRecord into \
>              tFilteredRecords[the number of elements in tFilteredRecords 
> + 1]
>     end repeat
>     return tFilteredRecords
> end _BenchmarkArrayFilterRecordsSet
> 
> Timings for 100000 iterations of the above are:
> 
>    6.7.11: 16508ms
>    9.0.1:  6397ms
>    PR6671: 3001ms
> 
> REAL WORLD CASE
> 
> Now, I'm always a little skeptical about using synthetic benchmarks for 
> performance. However, both of the above are actually real-world 
> examples. Furthermore, when running a rather large LCS project on an 
> engine with PR6671, I got a 2x speed up - one particular input took 
> 3mins to process, rather than 6mins (one phase of processing actually 
> saw a 5x speed up!).
> 
> Anyway, thought this might make some potentially pleasing Friday 
> afternoon news :)
> 
> Warmest Regards,
> 
> Mark.