Sorting text is *VERY* slow in LC9 on Windows (Re: Accumulating text is *VERY* slow in LC9 on Windows)

Ben Rubinstein benr_mc at cogapp.com
Fri Sep 3 13:20:18 EDT 2021


I'm very much hoping that Mark W might magically fix this in 9.6.5.

But in the meantime FWIW, the place where this was really hurting (in a script 
that took 8 minutes under LC6, was taking 8 hours under LC9, but I've 
gradually tamed it down to under an hour by buffering the large accumulations) 
was a single sort command, on 70 MB of data in approx 223,000 lines.

I've replaced this line:
	sort lines of tNewTable by item iSortCol of each

which took 1 second on Mac, 2063 seconds (i.e. 34 minutes) on Windows, with a 
call to this command

	command sortLinesByTabbedColumn @tTable, iSortCol
	   local aTable, tSortTable, iARcounter, tARbuffer, tRow, k

	   -- load table into an array for fast access by line number
	   put tTable into aTable
	   split aTable using return

	   -- compile index of just the column to sort on, and line number
	   set the itemDelimiter to tab
	   repeat for each key k in aTable
		  get (item iSortCol of aTable[k]) && k
		  appendRow it, iARcounter, tARbuffer, tSortTable
	   end repeat
	   put tARbuffer after tSortTable

	   -- sort it
	   sort lines of tSortTable

	   -- rebuild table out of array, in sorted order
	   put empty into tARbuffer
	   put empty into tTable
	   repeat for each line tRow in tSortTable
		  put last word of tRow into k
		  appendRow aTable[k], iARcounter, tARbuffer, tTable
	   end repeat
	   put tARbuffer after tTable

	end sortLinesByTabbedColumn

which takes 25 seconds on Windows (to my surprise, most of that time was in 
the final 'rebuild' loop).


On 02/09/2021 23:53, Bob Sneidar via use-livecode wrote:
> I am going to say no, because you still have to traverse the file once to get it into sqLite, then do the sort, then write out the file when done. I might be mistaken, the subsequent SQL sort may make up for lost time. Using a memory SQL really shines when you need to make multiple passes at the data using different queries. One pass may not impress you much.
> 
> For instance, I have a File Management module built into my application. A file can belong to a customer, and also to a site, and also to a device. Like so:
> 
> custid	siteid	deviceid	filepath
> 123						disk/folder/file1
> 456		098				disk/folder/file2
> 789		765		432		disk/folder/file3
> 
> Note all have a custid, some have a siteid as well, and some also have a deviceid.
> 
> So rather than query mySQL for the files for each site or device as I select them, I instead, upon selecting a customer, query mySQL for ALL the file records for that customer, (which of course contain the file records for all the sites and devices), then store that in a memory database. Then when a different site or device belonging to that customer is selected, I query the memory database for those belonging to that site, or that device in those modules respectively.
> 
> The performance enhancement is significant.
> 
> Another way I apply this is to get the objects on a card passing a list of properties I'm interested in, then store the data in a memory database. I can then query for objects with certain properties without having to iterate through all the objects on a card in a repeat loop. For instance, the farthest left, top, right and bottom object whose visible is true in 4 memory db queries, giving me the total rect of all the visible objects without grouping/ungrouping and the hell that can ensue.
> 
> Bob S
> 
> 
>> On Sep 2, 2021, at 11:22 , Bernard Devlin via use-livecode <use-livecode at lists.runrev.com> wrote:
>>
>> Whilst waiting for a fix, would a temporary solution be to use sqlite to
>> create an in-memory database and let sqlite do the sorting for you?
>>
>> Regards, Bernard.
>>
>> On Mon, Aug 30, 2021 at 8:23 PM Ben Rubinstein via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>
>>> Thanks to Mark Waddingham's advice about using a buffer var when
>>> accumulating
>>> a large text variabel in stages, I've now got a script that took 8 hours
>>> under
>>> LC9, and (8 minutes under LC6) down by stages to just under 1 hour under
>>> LC9.
>>>
>>> However I have some remaining issues not amenable to this approach; of
>>> which
>>> the most significant relates to the sort command.
>>>
>>> In all cases it seems to take much longer under LC9 than it did under LC6;
>>> although the factor is quite variable. The most dramatic is one instance,
>>> in
>>> which this statement:
>>>
>>>         sort lines of tNewTable by item iSortCol of each
>>>
>>> takes 35 minutes to execute. `tNewTable` is a variable consisting of some
>>> 223,000 lines of text; approx 70MB. The exact same statement with the same
>>> data on the same computer in LC6 takes just 1 second.
>>>
>>> Has anyone else noticed something of this sort? As I said, the effect
>>> varies:
>>> e.g. 54 seconds versus 1 second; 22 seconds versus 1 second. So it may not
>>> be
>>> so noticeable in all cases.
>>>
>>> TIA,
>>>
>>> Ben
>>>
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 




More information about the use-livecode mailing list