Sorting text is *VERY* slow in LC9 on Windows (Re: Accumulating text is *VERY* slow in LC9 on Windows)
Ben Rubinstein
benr_mc at cogapp.com
Fri Sep 3 13:20:18 EDT 2021
I'm very much hoping that Mark W might magically fix this in 9.6.5.
But in the meantime FWIW, the place where this was really hurting (in a script
that took 8 minutes under LC6, was taking 8 hours under LC9, but I've
gradually tamed it down to under an hour by buffering the large accumulations)
was a single sort command, on 70 MB of data in approx 223,000 lines.
I've replaced this line:
sort lines of tNewTable by item iSortCol of each
which took 1 second on Mac, 2063 seconds (i.e. 34 minutes) on Windows, with a
call to this command
command sortLinesByTabbedColumn @tTable, iSortCol
local aTable, tSortTable, iARcounter, tARbuffer, tRow, k
-- load table into an array for fast access by line number
put tTable into aTable
split aTable using return
-- compile index of just the column to sort on, and line number
set the itemDelimiter to tab
repeat for each key k in aTable
get (item iSortCol of aTable[k]) && k
appendRow it, iARcounter, tARbuffer, tSortTable
end repeat
put tARbuffer after tSortTable
-- sort it
sort lines of tSortTable
-- rebuild table out of array, in sorted order
put empty into tARbuffer
put empty into tTable
repeat for each line tRow in tSortTable
put last word of tRow into k
appendRow aTable[k], iARcounter, tARbuffer, tTable
end repeat
put tARbuffer after tTable
end sortLinesByTabbedColumn
which takes 25 seconds on Windows (to my surprise, most of that time was in
the final 'rebuild' loop).
On 02/09/2021 23:53, Bob Sneidar via use-livecode wrote:
> I am going to say no, because you still have to traverse the file once to get it into sqLite, then do the sort, then write out the file when done. I might be mistaken, the subsequent SQL sort may make up for lost time. Using a memory SQL really shines when you need to make multiple passes at the data using different queries. One pass may not impress you much.
>
> For instance, I have a File Management module built into my application. A file can belong to a customer, and also to a site, and also to a device. Like so:
>
> custid siteid deviceid filepath
> 123 disk/folder/file1
> 456 098 disk/folder/file2
> 789 765 432 disk/folder/file3
>
> Note all have a custid, some have a siteid as well, and some also have a deviceid.
>
> So rather than query mySQL for the files for each site or device as I select them, I instead, upon selecting a customer, query mySQL for ALL the file records for that customer, (which of course contain the file records for all the sites and devices), then store that in a memory database. Then when a different site or device belonging to that customer is selected, I query the memory database for those belonging to that site, or that device in those modules respectively.
>
> The performance enhancement is significant.
>
> Another way I apply this is to get the objects on a card passing a list of properties I'm interested in, then store the data in a memory database. I can then query for objects with certain properties without having to iterate through all the objects on a card in a repeat loop. For instance, the farthest left, top, right and bottom object whose visible is true in 4 memory db queries, giving me the total rect of all the visible objects without grouping/ungrouping and the hell that can ensue.
>
> Bob S
>
>
>> On Sep 2, 2021, at 11:22 , Bernard Devlin via use-livecode <use-livecode at lists.runrev.com> wrote:
>>
>> Whilst waiting for a fix, would a temporary solution be to use sqlite to
>> create an in-memory database and let sqlite do the sorting for you?
>>
>> Regards, Bernard.
>>
>> On Mon, Aug 30, 2021 at 8:23 PM Ben Rubinstein via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>
>>> Thanks to Mark Waddingham's advice about using a buffer var when
>>> accumulating
>>> a large text variabel in stages, I've now got a script that took 8 hours
>>> under
>>> LC9, and (8 minutes under LC6) down by stages to just under 1 hour under
>>> LC9.
>>>
>>> However I have some remaining issues not amenable to this approach; of
>>> which
>>> the most significant relates to the sort command.
>>>
>>> In all cases it seems to take much longer under LC9 than it did under LC6;
>>> although the factor is quite variable. The most dramatic is one instance,
>>> in
>>> which this statement:
>>>
>>> sort lines of tNewTable by item iSortCol of each
>>>
>>> takes 35 minutes to execute. `tNewTable` is a variable consisting of some
>>> 223,000 lines of text; approx 70MB. The exact same statement with the same
>>> data on the same computer in LC6 takes just 1 second.
>>>
>>> Has anyone else noticed something of this sort? As I said, the effect
>>> varies:
>>> e.g. 54 seconds versus 1 second; 22 seconds versus 1 second. So it may not
>>> be
>>> so noticeable in all cases.
>>>
>>> TIA,
>>>
>>> Ben
>>>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
More information about the use-livecode
mailing list