OMG text processing performance 6.7 - 9.5
Mark Waddingham
mark at livecode.com
Thu Jan 30 09:04:24 EST 2020
On 2020-01-30 13:20, Ben Rubinstein via use-livecode wrote:
> The context is that I'm finally forced to replace an app that's been
> processing data for a client for well over a decade. To date the
> standalone has been built on LC 6.7.11; but now we need to put it on a
> new platform with 64-bit database drivers. The performance has gone
> through the floor, through the floors below, through the foundations,
> and is on its way to the centre of the earth.
What's the need for 64-bit database drivers? i.e. What are you currently
using to talk to the database and why can you not continue to use a
32-bit
Windows standalone?
> The first stage of the app - which retrieves a load of data from
> various databases and online sources, does minimal processing on it,
> and dumps it to cache files - is approx 2x slower. The main core of
> the app, which loads this data in and does a vast amount of processing
> on it to generate various output data and reports, has gone from 12
> minutes to over *six hours*.
I suspect it is probably a couple of things which are being done
uniformly
causing the problem rather than lots of things all over the place...
Where exactly is the data coming from? (at a high-level) what sorts
of operations are being performed on it? what sort of I/O is being
performed?
The main one I can think of is implicit binary<->text conversions. In
6.7
and below binary data and text were the same thing - in 7+ they are
distinct
types which require a conversion operation. The functions which were
always
really returning/taking binary data now actually do.
e.g. textEncode / Decode, compress / decompress, binaryEncode /
binaryDecode,
the byte chunk, repeat for each byte, numToByte
Given the app is coming from 6.7 vintage, it is unlikely that any of the
new
unicode text codepaths would be hit (unless there's something odd going
on
somewhere) as binary data converts to native encoded text - unless of
course
the means by which the data is getting into the app is being taken as
unicode
strings (without knowing the exact I/O going on I can't really see how
this
could happen, but I can't rule it out).
In general, native text processing (item detection, comparison,
containment
and such) is all as fast if not faster in the post-7 engines than 6.7 as
I
spent quite a while specializing a lot of lower level routines to make
sure
it was.
I do know the word chunk has been somewhat adversely affected, however,
as
that was never optimized in the same way.
> The coding is gnarly - the oldest parts are probably at least 15 years
> old - and I've no doubt it could be made more efficient; but we don't
> have time or budget to rewrite it all. So, are there known gotchas,
> functions which have taken a much greater hit than others, that I
> could concentrate on to get the most ROI in speeding this up?
Given that you don't have time nor budget to really touch the code at
all
in any depth then it would best to not have to touch it at all and keep
it in 6.7.11? i.e. Do you really need to move to 6?
Could you split the app into the bit which does the database
communication
and caching (assuming that *really* needs to be 64-bit) and the bit
which
does the data processing (which could remain as 32-bit in 6.7.11).
Note I should say that the reason I ask the above is not because of a
lack
of confidence in getting your code to run as fast as it did before but
because of pure business reasoning - why spend time and money on
something
which isn't necessarily really needed?
There's a difference between needing to update user-facing apps and true
back-office server apps after all - banks and insurance companies still
have
software written on and running on machines which are decades old
because
they work and the cost of keeping them running is vastly less than the
cost
to rewrite and replace!).
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list