Slow stack problem

Sat Jun 29 05:27:19 EDT 2024

On 2024-06-29 08:53, Neville Smythe via use-livecode wrote:
> Is it not the case that the processing time for looping over the number 
> of lines and getting the k-th line in each iteration, for some 
> arbitrary k, going to be
> 
> Order(N^2) * C * T
> 
> where
> 
> N = the number of lines in the text
> 
> C = the number of codepoints in each line
> 
> T =  the (average) time for processing each codepoint to check for a 
> return character
> 
> Now N and C are the same whether the text is ascii or unicode

Largely - yes - although for stuff like this you need to think in terms 
of bytes not codepoints (as memory throughput becomes 'a thing' when the 
strings are anything longer than a few characters) - so unicode is 
2*ascii in this regard

[ Its actually more than 2x for longer strings but how much more depends 
on CPU/memory architecture - CPUs can only read from their level 1 
cache, and there's a cost to a cache miss, and you get 2x as many cache 
misses with unicode data as native data, assuming the data is larger 
than a single level 1 cache line. ]

> Test 1
> If I get rid of the red herring by replacing the matchChunk call with a 
> simple
> ...
> Which would appear to mean that processing unicode codepoints for the 
> apparently simple task of checking for return takes 100 times as much 
> time as processing ascii. That seems excessive, to put it mildly!

Its a lot slower certainly, but then searching unicode text for a string 
is (in the general case) a lot more complex than searching native/ascii 
text for a string.

> Test 2
> With the original code using matchChunk, which of course is going to 
> have its own internal loop on code points so multiply by another 8 (it 
> only searches the first few characters)
> and will not always return true so more lines must be processed — but 
> again these are same multipliers whether ascii or unicode
> ...
> Plain ascii takes   0.07 seconds
> Unicode takes    19.9 seconds, a multiplier of nearly 300. — I can 
> easily believe matchChunk takes 3 times as long to process unicode as 
> ascii, this is the sort of factor I would have expected in Test 1.

So 'Test 2' is slightly misleading - as it still suggests matchChunk is 
causing a slowdown - which it isn't.

The difference here is Test 2 is doing more work as it isn't always 
exiting. If you test:

   get line k of fff
   put true into tFound

I suspect you'll find the time to process your data if it contains 
unicode is pretty similar to that when matchChunk is also called.

In my quick test (which is 32 index lines, 200 fff lines) I get about 
10ms (no unicode) vs 1400ms (unicode)

> OK Mark, hit me again, I am a glutton for punishment, what is wrong 
> with this analysis?

Nothing in particular - apart from thinking that matchChunk is actually 
a relevant factor here ;)

The reasons this delimiter search operation on unicode strings is so 
much slower than native is for two reasons:
   1) We (well, I) heavily optimized the core native/ascii string 
operations in 2015 to make sure there were as fast as possible
   2) Searching a unicode string for another string (which is what is 
going on here) is much more complex than doing the same for native/ascii

Native/ascii strings have some very pleasant properties:
   - one byte (codeunit) is one character - always.
   - each character has only one representation - its byte value
   - casing is a simple mapping between lower and upper case characters - 
and only about 25% of characters are affected

Unicode has none of these properties
   - a unicode character (grapheme) can be arbitrarily many codeunits (2 
byte quantities) long
   - characters can have multiple representations - e.g. e-acute vs 
e,combining-acute
   - casing is not (in general) a simple mapping of one codeunit to 
another

Currently the unicode operations in the engine are largely unoptimized - 
they assume the general case in all things so even searching a string 
for LF (which is the case here) is still done under the assumption that 
it might need that (very hefty) extra processing.

Of course it would be nice to have highly optimized low-level unicode 
string optimizations but you have to pick your battles (particular when 
the battles are incredibly technical ones!) but the reality is that this 
(admittedly large!) speed difference is only really noticeable 'at 
scale' and when scale is involved, there's pretty much always an 
algorithmic change which can make those 'low-level' performance 
differences irrelevant.

The case here is a really good example.

The line X based code gives (no matchChunk / with matchChunk):

   ASCII 300 lines  13ms / 22ms
   ASCII 3000 lines - 986ms / 1104ms
   ASCII 10000 lines - 10804ms / 11213ms

The array based code gives (no matchChunk / with matchChunk):

   ASCII 300 lines - 2ms / 11ms
   ASCII 3000 lines - 19ms / 101ms
   ASCII 10000 lines - 69ms / 336ms

   UNICODE 300 lines - 7ms / 12ms
   UNICODE 3000 lines - 52ms / 108ms
   UNICODE 10000 lines - 170ms / 359ms

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things