Array Split vs Combine
Monte Goulding
monte at appisle.net
Wed Mar 10 21:44:18 EST 2021
Ideally we wouldn’t introduce an additional iteration of the array to pre-calculate but yes there’s probably a few ways to go. It could also be possible to flag a string with a buffer allocation strategy on creation so the cases where the engine creates a string then repeatedly appends to it can use a different strategy. Anyway, I’m actually only guessing at the problem so best to create an issue from which we can create a benchmark test.
Cheers
Monte
> On 11 Mar 2021, at 1:28 pm, Sean Cole (Pi) via use-livecode <use-livecode at lists.runrev.com> wrote:
>
> Monte,
> Would it be possible to precalculate (ish) how big a string buffer is
> required from the size (memalloc) of the array? Then, if later down the
> process, it works out it doesn't have enough, it can add a bunch more
> memory to the buffer, thus reducing the frequency of buffer resizing. Or
> maybe do it in kind-of blocks of n bytes. I'm just spit-balling but I guess
> you get my thinking.
>
> Sean
>
> On Thu, 11 Mar 2021 at 00:52, Monte Goulding via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>
>> It’s probably most likely to do with the string buffer needing to be
>> constantly resized as the array is iterated for combine. Some googling
>> suggests Windows may have issues with this. Our strategy for growing string
>> buffers at the moment is to allocate just enough for the string. Mark would
>> need to chime in on whether growing the buffer exponentially would be
>> suitable. It would result in more memory being allocated than necessary but
>> much less frequent allocations so it depends on what’s most costly as
>> memory gets cheaper.
>>
>> Cheers
>>
>> Monte
>>
>>> On 11 Mar 2021, at 11:34 am, Sean Cole (Pi) via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>>
>>> The code for 'Split':
>>>
>>> void MCArraysExecSplit(MCExecContext& ctxt, MCStringRef p_string,
>>> MCStringRef p_element_delimiter, MCStringRef p_key_delimiter, MCArrayRef&
>>> r_array)
>>> {
>>> if (MCStringSplit(p_string, p_element_delimiter, p_key_delimiter, ctxt .
>>> GetStringComparisonType(), r_array))
>>> return;
>>>
>>> ctxt . Throw();
>>> }
>>>
>>>
>>> vs
>>>
>>> The code for 'Combine' (// comments added by me):
>>>
>>> void MCArraysExecCombine(MCExecContext& ctxt, MCArrayRef p_array,
>>> MCStringRef p_element_delimiter, MCStringRef p_key_delimiter,
>> MCStringRef&
>>> r_string)
>>> {
>>> bool t_success;
>>> t_success = true; // Create a register to check progress success
>>>
>>> uindex_t t_count; // Create a new (t)emp counter for indices
>>> t_count = MCArrayGetCount(p_array); // Find out how many arrays there
>> are
>>>
>>> MCAutoStringRef t_string; // Create (t)emp string to store the result
>>> if (t_success)
>>>
>>> t_success = MCStringCreateMutable(0, &t_string); // t_success is always
>>> true here initially and is reset as true if t_string is now mutable,
>> false
>>> if not
>>>
>>>
>>> combine_array_t t_lisctxt; // create a new array object
>>> t_lisctxt . elements = nil; // initialise t_lisctxt array
>>> if (t_success)
>>>
>>> t_success = MCMemoryNewArray(t_count, t_lisctxt . elements); // make sure
>>> the array was created
>>>
>>>
>>> if (t_success)
>>> {
>>>
>>> t_lisctxt . index = 0;
>>>
>>> MCArrayApply(p_array, list_array_elements, &t_lisctxt);
>>>
>>> qsort(t_lisctxt . elements, t_count, sizeof(array_element_t),
>>> compare_array_element); // sort the elements
>>>
>>> for(uindex_t i = 0; i < t_count; i++)
>>>
>>> { // Loop through all indices
>>>
>>> MCAutoStringRef t_value_as_string; // create a (t)emp string for element
>>> value
>>>
>>>
>>>
>>> t_success = ctxt . ConvertToString(t_lisctxt . elements[i] . value,
>>> &t_value_as_string); // convert array value to string
>>>
>>> if (!t_success)
>>>
>>> break; skip if unable to convert to string
>>>
>>>
>>> t_success =
>>>
>>> (p_key_delimiter == nil ||
>>>
>>> (MCStringAppend(*t_string, MCNameGetString(t_lisctxt . elements[i] .
>> key))
>>> &&
>>>
>>> MCStringAppend(*t_string, p_key_delimiter)))&&
>>>
>>> MCStringAppend(*t_string, *t_value_as_string) &&
>>>
>>> (i == t_count - 1 ||
>>>
>>> MCStringAppend(*t_string, p_element_delimiter)); // t_success is true if
>>> the array element and values are added correctly
>>>
>>>
>>> if (!t_success)
>>>
>>> break; // skip if unable to add value
>>>
>>> }
>>>
>>> }
>>>
>>> if (t_success)
>>>
>>> t_success = MCStringCopy(*t_string, r_string); // Copies the (t)emp
>> string
>>> into the (r)eturn string
>>>
>>>
>>> MCMemoryDeleteArray(t_lisctxt . elements);
>>>
>>> if (t_success)
>>>
>>> return;
>>>
>>>
>>> // Throw the current error code (since last library call returned false).
>>> ctxt . Throw();
>>> }
>>>
>>>
>>> Following on from Bob's VM comment, there is reference to
>>> 'MCMemoryNewArray(t_count,
>>> t_lisctxt . elements)' which does highlight that some memory management
>> for
>>> the arrays is necessary in the combine command. This only creates a
>>> temporary copy of the array for working through. How this plays out
>>> differently for Windows vs Mac/Linux and why this would be increasing the
>>> time necessary by a factor of about 4:1 I can't see.
>>>
>>> I've tested as far back as LC7
>>> (Times - Read into memory, Split to array, Combine from array)
>>> LC9.5.0 Win64 - 0.437s, 0.516s, 3m 1.378s
>>> LC9.0.5 Win32 - 0.446s, 0.547s, 3m 27.9s
>>> LC8.2.0 DP2 - 0.543s, 0.577s, 3m 30.208s
>>> LC8.0.0 - 0.542s, 0.545s, 3m 30.815s
>>> LC7.0.0 - 0.827s, 0.460s , 3m 37.896s
>>>
>>> On mac all times are less than 1sec, 3 sec total.
>>>
>>> Sean
>>>
>>>
>>> On Wed, 10 Mar 2021 at 17:08, Bob Sneidar via use-livecode <
>>> use-livecode at lists.runrev.com> wrote:
>>>
>>>> Now THAT is fascinating, considering the Windows performance issues with
>>>> file access reported in the past. Could it be that combine is somehow
>>>> caching data to virtual memory?
>>>>
>>>> Bob S
>>>>
>>>>
>>>> On Mar 9, 2021, at 1:05 PM, Sean Cole (Pi) via use-livecode <
>>>> use-livecode at lists.runrev.com<mailto:use-livecode at lists.runrev.com>>
>>>> wrote:
>>>>
>>>> It's looking to be a Windows only issue. I need to see how far this goes
>>>> back and then I'll post a bug report. It's making a process that should
>>>> only take 30s max on a single thread 2GHz remote Win server take 16mins
>> to
>>>> process 2 of these files, so it will be good to find a solution for
>> this.
>>>>
>>>> Thanks everyone for confirming and providing your input.
>>>>
>>>> Regards
>>>> Sean
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list