Array Split vs Combine

Monte Goulding monte at appisle.net
Wed Mar 10 21:44:18 EST 2021


Ideally we wouldn’t introduce an additional iteration of the array to pre-calculate but yes there’s probably a few ways to go. It could also be possible to flag a string with a buffer allocation strategy on creation so the cases where the engine creates a string then repeatedly appends to it can use a different strategy. Anyway, I’m actually only guessing at the problem so best to create an issue from which we can create a benchmark test.

Cheers

Monte

> On 11 Mar 2021, at 1:28 pm, Sean Cole (Pi) via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Monte,
> Would it be possible to precalculate (ish) how big a string buffer is
> required from the size (memalloc) of the array? Then, if later down the
> process, it works out it doesn't have enough, it can add a bunch more
> memory to the buffer, thus reducing the frequency of buffer resizing. Or
> maybe do it in kind-of blocks of n bytes. I'm just spit-balling but I guess
> you get my thinking.
> 
> Sean
> 
> On Thu, 11 Mar 2021 at 00:52, Monte Goulding via use-livecode <
> use-livecode at lists.runrev.com> wrote:
> 
>> It’s probably most likely to do with the string buffer needing to be
>> constantly resized as the array is iterated for combine. Some googling
>> suggests Windows may have issues with this. Our strategy for growing string
>> buffers at the moment is to allocate just enough for the string. Mark would
>> need to chime in on whether growing the buffer exponentially would be
>> suitable. It would result in more memory being allocated than necessary but
>> much less frequent allocations so it depends on what’s most costly as
>> memory gets cheaper.
>> 
>> Cheers
>> 
>> Monte
>> 
>>> On 11 Mar 2021, at 11:34 am, Sean Cole (Pi) via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>> 
>>> The code for 'Split':
>>> 
>>> void MCArraysExecSplit(MCExecContext& ctxt, MCStringRef p_string,
>>> MCStringRef p_element_delimiter, MCStringRef p_key_delimiter, MCArrayRef&
>>> r_array)
>>> {
>>> if (MCStringSplit(p_string, p_element_delimiter, p_key_delimiter, ctxt .
>>> GetStringComparisonType(), r_array))
>>> return;
>>> 
>>> ctxt . Throw();
>>> }
>>> 
>>> 
>>> vs
>>> 
>>> The code for 'Combine' (// comments added by me):
>>> 
>>> void MCArraysExecCombine(MCExecContext& ctxt, MCArrayRef p_array,
>>> MCStringRef p_element_delimiter, MCStringRef p_key_delimiter,
>> MCStringRef&
>>> r_string)
>>> {
>>> bool t_success;
>>> t_success = true;  // Create a register to check progress success
>>> 
>>> uindex_t t_count;  // Create a new (t)emp counter for indices
>>> t_count = MCArrayGetCount(p_array);  // Find out how many arrays there
>> are
>>> 
>>> MCAutoStringRef t_string;  // Create (t)emp string to store the result
>>> if (t_success)
>>> 
>>> t_success = MCStringCreateMutable(0, &t_string); // t_success is always
>>> true here initially and is reset as true if t_string is now mutable,
>> false
>>> if not
>>> 
>>> 
>>> combine_array_t t_lisctxt;  // create a new array object
>>> t_lisctxt . elements = nil; // initialise t_lisctxt array
>>> if (t_success)
>>> 
>>> t_success = MCMemoryNewArray(t_count, t_lisctxt . elements); // make sure
>>> the array was created
>>> 
>>> 
>>> if (t_success)
>>> {
>>> 
>>> t_lisctxt . index = 0;
>>> 
>>> MCArrayApply(p_array, list_array_elements, &t_lisctxt);
>>> 
>>> qsort(t_lisctxt . elements, t_count, sizeof(array_element_t),
>>> compare_array_element); // sort the elements
>>> 
>>> for(uindex_t i = 0; i < t_count; i++)
>>> 
>>> { // Loop through all indices
>>> 
>>> MCAutoStringRef t_value_as_string; // create a (t)emp string for element
>>> value
>>> 
>>> 
>>> 
>>> t_success = ctxt . ConvertToString(t_lisctxt . elements[i] . value,
>>> &t_value_as_string); // convert array value to string
>>> 
>>> if (!t_success)
>>> 
>>> break; skip if unable to convert to string
>>> 
>>> 
>>> t_success =
>>> 
>>> (p_key_delimiter == nil ||
>>> 
>>> (MCStringAppend(*t_string, MCNameGetString(t_lisctxt . elements[i] .
>> key))
>>> &&
>>> 
>>> MCStringAppend(*t_string, p_key_delimiter)))&&
>>> 
>>> MCStringAppend(*t_string, *t_value_as_string) &&
>>> 
>>> (i == t_count - 1 ||
>>> 
>>> MCStringAppend(*t_string, p_element_delimiter)); // t_success is true if
>>> the array element and values are added correctly
>>> 
>>> 
>>> if (!t_success)
>>> 
>>> break; // skip if unable to add value
>>> 
>>> }
>>> 
>>> }
>>> 
>>> if (t_success)
>>> 
>>> t_success = MCStringCopy(*t_string, r_string);  // Copies the (t)emp
>> string
>>> into the (r)eturn string
>>> 
>>> 
>>> MCMemoryDeleteArray(t_lisctxt . elements);
>>> 
>>> if (t_success)
>>> 
>>> return;
>>> 
>>> 
>>> // Throw the current error code (since last library call returned false).
>>> ctxt . Throw();
>>> }
>>> 
>>> 
>>> Following on from Bob's VM comment, there is reference to
>>> 'MCMemoryNewArray(t_count,
>>> t_lisctxt . elements)' which does highlight that some memory management
>> for
>>> the arrays is necessary in the combine command. This only creates a
>>> temporary copy of the array for working through. How this plays out
>>> differently for Windows vs Mac/Linux and why this would be increasing the
>>> time necessary by a factor of about 4:1 I can't see.
>>> 
>>> I've tested as far back as LC7
>>> (Times - Read into memory, Split to array, Combine from array)
>>> LC9.5.0 Win64 - 0.437s, 0.516s, 3m 1.378s
>>> LC9.0.5 Win32 - 0.446s, 0.547s, 3m 27.9s
>>> LC8.2.0 DP2 - 0.543s, 0.577s, 3m 30.208s
>>> LC8.0.0 - 0.542s, 0.545s, 3m 30.815s
>>> LC7.0.0 - 0.827s, 0.460s , 3m 37.896s
>>> 
>>> On mac all times are less than 1sec, 3 sec total.
>>> 
>>> Sean
>>> 
>>> 
>>> On Wed, 10 Mar 2021 at 17:08, Bob Sneidar via use-livecode <
>>> use-livecode at lists.runrev.com> wrote:
>>> 
>>>> Now THAT is fascinating, considering the Windows performance issues with
>>>> file access reported in the past. Could it be that combine is somehow
>>>> caching data to virtual memory?
>>>> 
>>>> Bob S
>>>> 
>>>> 
>>>> On Mar 9, 2021, at 1:05 PM, Sean Cole (Pi) via use-livecode <
>>>> use-livecode at lists.runrev.com<mailto:use-livecode at lists.runrev.com>>
>>>> wrote:
>>>> 
>>>> It's looking to be a Windows only issue. I need to see how far this goes
>>>> back and then I'll post a bug report. It's making a process that should
>>>> only take 30s max on a single thread 2GHz remote Win server take 16mins
>> to
>>>> process 2 of these files, so it will be good to find a solution for
>> this.
>>>> 
>>>> Thanks everyone for confirming and providing your input.
>>>> 
>>>> Regards
>>>> Sean
>>>> 
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list