OMG text processing performance 6.7 - 9.5
Mark Waddingham
mark at livecode.com
Thu Jan 30 10:03:05 EST 2020
On 2020-01-30 14:38, Ben Rubinstein via use-livecode wrote:
> Hi Mark,
>
> Thanks for taking the time to reply!
>
> I'm indeed currently in the process of seeing whether I can persuade
> the client's IT department to install the 32-bit drivers on the new
> VM. I'm optimistic that will buy me some time, but it won't be a
> complete solution because they outsource support to a third company,
> which has warned that it doesn't intend to support the 32-bit drivers
> much longer (apparently they're just waiting for Crystal Reports to be
> updated!).
Ah! From that I'm guessing you are using the ODBC revdb driver - which
needs
a third-party ODBC connector :)
> And if that fails, one of my options is as you suggest to use the LC
> 9.5-built app to retrieve the data through the 64-bit drivers, and the
> the LC 6.7-built app to process and (probably) build it. It will be
> shonky.
It doesn't have to be 'shonky' - if the fetch-from-database part is
already
separated from the data-processing-part through cache-files (i.e. fetch
writes to files on disk, data-process reads said files and processes)
then
you could build a 64-bit win standalone which is the fetch-from-database
part, which is then called by the data-process part using shell (or open
process).
Of course, it would be slightly cleaner to all be one app :)
> However, what you say certainly makes me feel more optimistic that
> something should be possible. There's really very little going on in
> the way of binary<->text conversion; there probably is a fair amount
> of word chunking - although half the work is about tracing
> cross-references etc, there's also a fair amount of processing of
> 'prose' and prose-like text. However, the nature of the text is that
> although 99% of it is probably ASCII, in any given table of text there
> will be just a few 'extended' characters - does that mean it all gets
> treated as four-byte data?
Binary<->text can be quite subtle - as it isn't something you had to
think
about before 6.7. For example, if you are fetching using *b via revDB
from
the database, then *that* will now be binary data - not text. (Indeed,
what
accessors are you using to get the data?)
Also, things like binfile and reading for binary (from a file) will also
produce binary rather than text.
You can test for binary data using 'is strictly a binary string'.
Native encoding means (on Windows at least) anything which fits into
Latin-1
so any text you are getting out of revDB from the database should come
through
as native strings.
Native strings get converted to unicode internally when they are
combined with
a string which contains unicode and in two other places:
1) Using matchText / replaceText (because we use the utf-16 variant of
PCRE)
2) When put into a field (because all text layout APIs on all
platforms use UTF-16)
What sort of text operations are you using for 'tracing cross-references
etc' and
'processing of 'prose' and prose-like text'?
> I'll see how the negotiations with IT get on...
Good luck!
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list