On Performance of Array Access

Fri Aug 31 09:58:04 EDT 2018

Generally, I don't tend to like to 'jump the gun' on anything related to 
optimization lest it is not what it seems when running in the real world 
but...

I'm becoming increasingly confident that a recent foray of mine into yet 
another attempt to improve the speed of array access in LC9 might have 
actually born some quite tasty fruit. Pleasingly the code changes 
required are small, highly localized and quite possibly more than 
suitable for including in a 9.0.x maintenance release.

The current WIP PR can be found here:

   https://github.com/livecode/livecode/pull/6671

Amusingly this didn't even start as an attempt at optimization, but an 
attempt to see if I could do anything about heap fragmentation!

The patch itself does three things:

   1) Minimizes the in-memory size of the core types - all 'book-keeping' 
records are now <=32 bytes on 64-bit systems

   2) Makes use of tArray[x]...[z] type expressions much more efficient 
when both evaluating and mutating

   3) Makes use of integer indicies in arrays much more efficient

I've been running it on a number of benchmarks, the two most notable 
follow.

LARGE ARRAY CONSTRUCTION

The following simple code example was supplied in Bug 17434 by Richard 
as it appears to show a memory leak (its actually heap fragmentation, 
but that's another story):

         put 100 into tMax
	repeat with i = 1 to tMax
		repeat with j = 1 to tMax
			repeat with k = 1 to tMax
				put any item of "bob,carol,ted,alice" into sA[i][j][k]
			end repeat
		end repeat
	end repeat

The code constructs an 100x100x100 matrix (rendered as a nested array, 
rather than a flat one).

Timings in various versions (on my 2018 MBP laptop) are:

   6.7.11: 1117ms
   9.0.1:  4020ms
   PR6671: 1017ms

This would seem to be a not too uncommon a pattern of code - judging by 
JBV's recent post on here (which was essentially doing the same thing).

We can modify the above slightly to make it more focussed on array 
performance (i.e. taking out the need to construct / copy a random 
portion of a string):

         put 100 into tMax
	repeat with i = 1 to tMax
		repeat with j = 1 to tMax
			repeat with k = 1 to tMax
				put "bob,carol,ted,alice" into sA[i][j][k]
			end repeat
		end repeat
	end repeat

Timings for this are:

   6.7.11: 1055ms
   9.0.1:  3281ms
   PR6671: 497ms

ARRAY RECORD FILTERING

The following code example was derived by me for my recent LCG talk 
based on a real-world code supplied by Mark Talluto. It iterates over a 
sequence of arrays, creating a new one where each record only has a 
subset of the keys. The keys to keep are taken from a stringlist:

private function _BenchmarkArrayFilterRecords pRecords, pColumnNames
    local tFilteredRecords, tFilteredRecord
    repeat for each element tRecord in pRecords
       repeat for each item tColumnName in pColumnNames
          put tRecord[tColumnName] into tFilteredRecord[tColumnName]
       end repeat
       put tFilteredRecord into \
             tFilteredRecords[the number of elements in tFilteredRecords 
+ 1]
    end repeat
    return tFilteredRecords
end _BenchmarkArrayFilterRecords

Timings for 100000 iterations of the above are:

   6.7.11: 16872ms
   9.0.1:  8305ms
   PR6671: 4315ms

The following is the same as the above, except the keys to keep are 
taken from the keys of an array:

private function _BenchmarkArrayFilterRecordsSet pRecords, 
pColumnNameSet
    local tFilteredRecords, tFilteredRecord
    repeat for each element tRecord in pRecords
       repeat for each key tColumnName in pColumnNameSet
          put tRecord[tColumnName] into tFilteredRecord[tColumnName]
       end repeat
       put tFilteredRecord into \
             tFilteredRecords[the number of elements in tFilteredRecords 
+ 1]
    end repeat
    return tFilteredRecords
end _BenchmarkArrayFilterRecordsSet

Timings for 100000 iterations of the above are:

   6.7.11: 16508ms
   9.0.1:  6397ms
   PR6671: 3001ms

REAL WORLD CASE

Now, I'm always a little skeptical about using synthetic benchmarks for 
performance. However, both of the above are actually real-world 
examples. Furthermore, when running a rather large LCS project on an 
engine with PR6671, I got a 2x speed up - one particular input took 
3mins to process, rather than 6mins (one phase of processing actually 
saw a 5x speed up!).

Anyway, thought this might make some potentially pleasing Friday 
afternoon news :)

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps