Accumulating text is *VERY* slow in LC9 on Windows

Ben Rubinstein benr_mc at cogapp.com
Wed Aug 25 13:15:10 EDT 2021


Some 20 months ago, I reported that I was in a situation where an app written 
in 6.7 needed to be updated to access 64bit drivers, which meant updating to 
9.5 - which displayed horrifying increase in processing time.

In fact I was able to put off the evil day - but now it has returned, and can 
be put off no longer. A process that normally takes 2 hours is currently 
taking 9. The core processing stage has gone from around ten minutes to over 
six hours.

After way too long, I've finally got down to at least one smoking gun; which 
is as simple as can be.

Part of what took me so long is a confusion; in production the process runs on 
Windows, but I develop on Mac. Although on Mac the overall process does take 
about a third longer in LC9 than LC6, the simple tests I've finally isolated 
actually run much _quicker_ in LC9 than LC6. So switching between LC6 and LC9 
on Mac as I tried to isolate the issue was giving confusing signals. But 
unmistakeably it's *much* slower on Windows.

A simple routine which loops over a load of tab and return formatted data 
loaded from a TSV file, to truncate a particular field, had the following 
results processing a 70MB file of approximately 257,000 rows:

	6.7.11 MacOS	  9 seconds
	6.7.11 Win32	 10 seconds
	9.6.3  MacOS	  2 seconds
	9.6.3  Win32	498 seconds

I simplified it down to this (pointless) loop which just rebuilds a table one 
line at a time:

    local tNewTable
    repeat for each line tRow in tWorkTable
       put tRow & return after tNewTable
    end repeat

with these results:

	6.7.11 MacOS	  8 seconds
	6.7.11 Win32	  7 seconds
	9.6.3  MacOS	  0 seconds
	9.6.3  Win32	591 seconds

(there's obviously a lot of variability in these - both were running in IDE, 
on a logged-in computer, so stuff was probably going on in the background; but 
I know the overall effect is similar when built as standalone and running by 
schedule on an unattended machine. But the key thing is: for this task, LC9 is 
dramatically slower on Windows!)

Have others seen something like this?

When I posted about this before (thread: "OMG text processing performance 6.7 
- 9.5") Mark Waddingham suggested that it might be to do with a hidden cost of 
binary<->text transforms. That makes some sense; but given that the text 
already exists, I'm wondering whether taking a line out of text would cause it 
to be transformed, only to be transformed again when appending? And in 
particular, why this would affect Windows only.

I have also added tests using "is strictly a binary string" in the code above, 
and this was true for neither input 'tWorkTable', nor the output 'tNewTable', 
nor any of the 257,00 extracted lines.

However it is definitely the accumulating of text that is the issue - simply 
looping over the lines - even with testing each one to see if it is "strictly 
a binary string" - is a second or less on Windows in LC9.

Has anyone had similar experiences? Suggestions for how this could be avoided?

Many thanks in advance,

Ben




More information about the use-livecode mailing list