OMG text processing performance 6.7 - 9.5

Thu Jan 30 09:38:16 EST 2020

Hi Mark,

Thanks for taking the time to reply!

I'm indeed currently in the process of seeing whether I can persuade the 
client's IT department to install the 32-bit drivers on the new VM. I'm 
optimistic that will buy me some time, but it won't be a complete solution 
because they outsource support to a third company, which has warned that it 
doesn't intend to support the 32-bit drivers much longer (apparently they're 
just waiting for Crystal Reports to be updated!).

And if that fails, one of my options is as you suggest to use the LC 9.5-built 
app to retrieve the data through the 64-bit drivers, and the the LC 6.7-built 
app to process and (probably) build it. It will be shonky.

However, what you say certainly makes me feel more optimistic that something 
should be possible. There's really very little going on in the way of 
binary<->text conversion; there probably is a fair amount of word chunking - 
although half the work is about tracing cross-references etc, there's also a 
fair amount of processing of 'prose' and prose-like text. However, the nature 
of the text is that although 99% of it is probably ASCII, in any given table 
of text there will be just a few 'extended' characters - does that mean it all 
gets treated as four-byte data?

I'll see how the negotiations with IT get on...

Ben

On 30/01/2020 14:04, Mark Waddingham via use-livecode wrote:
> On 2020-01-30 13:20, Ben Rubinstein via use-livecode wrote:
>> The context is that I'm finally forced to replace an app that's been
>> processing data for a client for well over a decade. To date the
>> standalone has been built on LC 6.7.11; but now we need to put it on a
>> new platform with 64-bit database drivers. The performance has gone
>> through the floor, through the floors below, through the foundations,
>> and is on its way to the centre of the earth.
> 
> What's the need for 64-bit database drivers? i.e. What are you currently
> using to talk to the database and why can you not continue to use a 32-bit
> Windows standalone?
> 
>> The first stage of the app - which retrieves a load of data from
>> various databases and online sources, does minimal processing on it,
>> and dumps it to cache files - is approx 2x slower. The main core of
>> the app, which loads this data in and does a vast amount of processing
>> on it to generate various output data and reports, has gone from 12
>> minutes to over *six hours*.
> 
> I suspect it is probably a couple of things which are being done uniformly
> causing the problem rather than lots of things all over the place...
> 
> Where exactly is the data coming from? (at a high-level) what sorts
> of operations are being performed on it? what sort of I/O is being performed?
> 
> The main one I can think of is implicit binary<->text conversions. In 6.7
> and below binary data and text were the same thing - in 7+ they are distinct
> types which require a conversion operation. The functions which were always
> really returning/taking binary data now actually do.
> 
> e.g. textEncode / Decode, compress / decompress, binaryEncode / binaryDecode,
> the byte chunk, repeat for each byte, numToByte
> 
> Given the app is coming from 6.7 vintage, it is unlikely that any of the new
> unicode text codepaths would be hit (unless there's something odd going on
> somewhere) as binary data converts to native encoded text - unless of course
> the means by which the data is getting into the app is being taken as unicode
> strings (without knowing the exact I/O going on I can't really see how this
> could happen, but I can't rule it out).
> 
> In general, native text processing (item detection, comparison, containment
> and such) is all as fast if not faster in the post-7 engines than 6.7 as I
> spent quite a while specializing a lot of lower level routines to make sure
> it was.
> 
> I do know the word chunk has been somewhat adversely affected, however, as
> that was never optimized in the same way.
> 
>> The coding is gnarly - the oldest parts are probably at least 15 years
>> old - and I've no doubt it could be made more efficient; but we don't
>> have time or budget to rewrite it all. So, are there known gotchas,
>> functions which have taken a much greater hit than others, that I
>> could concentrate on to get the most ROI in speeding this up?
> 
> Given that you don't have time nor budget to really touch the code at all
> in any depth then it would best to not have to touch it at all and keep
> it in 6.7.11? i.e. Do you really need to move to 6?
> 
> Could you split the app into the bit which does the database communication
> and caching (assuming that *really* needs to be 64-bit) and the bit which
> does the data processing (which could remain as 32-bit in 6.7.11).
> 
> Note I should say that the reason I ask the above is not because of a lack
> of confidence in getting your code to run as fast as it did before but
> because of pure business reasoning - why spend time and money on something
> which isn't necessarily really needed?
> 
> There's a difference between needing to update user-facing apps and true
> back-office server apps after all - banks and insurance companies still have
> software written on and running on machines which are decades old because
> they work and the cost of keeping them running is vastly less than the cost
> to rewrite and replace!).
> 
> Warmest Regards,
> 
> Mark.
>