Accumulating text is *VERY* slow in LC9 on Windows
benr_mc at cogapp.com
Wed Aug 25 13:15:10 EDT 2021
Some 20 months ago, I reported that I was in a situation where an app written
in 6.7 needed to be updated to access 64bit drivers, which meant updating to
9.5 - which displayed horrifying increase in processing time.
In fact I was able to put off the evil day - but now it has returned, and can
be put off no longer. A process that normally takes 2 hours is currently
taking 9. The core processing stage has gone from around ten minutes to over
After way too long, I've finally got down to at least one smoking gun; which
is as simple as can be.
Part of what took me so long is a confusion; in production the process runs on
Windows, but I develop on Mac. Although on Mac the overall process does take
about a third longer in LC9 than LC6, the simple tests I've finally isolated
actually run much _quicker_ in LC9 than LC6. So switching between LC6 and LC9
on Mac as I tried to isolate the issue was giving confusing signals. But
unmistakeably it's *much* slower on Windows.
A simple routine which loops over a load of tab and return formatted data
loaded from a TSV file, to truncate a particular field, had the following
results processing a 70MB file of approximately 257,000 rows:
6.7.11 MacOS 9 seconds
6.7.11 Win32 10 seconds
9.6.3 MacOS 2 seconds
9.6.3 Win32 498 seconds
I simplified it down to this (pointless) loop which just rebuilds a table one
line at a time:
repeat for each line tRow in tWorkTable
put tRow & return after tNewTable
with these results:
6.7.11 MacOS 8 seconds
6.7.11 Win32 7 seconds
9.6.3 MacOS 0 seconds
9.6.3 Win32 591 seconds
(there's obviously a lot of variability in these - both were running in IDE,
on a logged-in computer, so stuff was probably going on in the background; but
I know the overall effect is similar when built as standalone and running by
schedule on an unattended machine. But the key thing is: for this task, LC9 is
dramatically slower on Windows!)
Have others seen something like this?
When I posted about this before (thread: "OMG text processing performance 6.7
- 9.5") Mark Waddingham suggested that it might be to do with a hidden cost of
binary<->text transforms. That makes some sense; but given that the text
already exists, I'm wondering whether taking a line out of text would cause it
to be transformed, only to be transformed again when appending? And in
particular, why this would affect Windows only.
I have also added tests using "is strictly a binary string" in the code above,
and this was true for neither input 'tWorkTable', nor the output 'tNewTable',
nor any of the 257,00 extracted lines.
However it is definitely the accumulating of text that is the issue - simply
looping over the lines - even with testing each one to see if it is "strictly
a binary string" - is a second or less on Windows in LC9.
Has anyone had similar experiences? Suggestions for how this could be avoided?
Many thanks in advance,
More information about the use-livecode