Garbage collection (crashing on Windows)

Ben Rubinstein benr_mc at cogapp.com
Mon Aug 22 08:47:27 EDT 2016


Mark,

Thanks so much for this detailed and very useful response. A few quick 
follow-up questions.


1)
 > 
https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/
This looks like a great tip - before I go into the ring with the client's IT 
dept (always a tricky exercise) can I just check that LiveCode does have the 
IMAGE_FILE_LARGE_ADDRESS_AWARE flag set in the image header as described in 
that article? And do you know from which version that is true?


(not a question)
 > We actually changed this mechanism to make it less conservative in
 > 6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted objects wouldn't get
 > actually freed until the root event loop runs

Aha!  Sadly I just updated the app to 6.7.11 without improving the situation 
(and on an experimental only basis, to 8.1 also, similarly without improvement).

2)
> LiveCode doesn't use what is generally referred to as 'garbage collection'
> as it generally frees 'things' up as soon as they are no longer referenced.

But does 'freed' literally release the memory, or just mark the object as 
available? Surely you still need to do some kind of garbage collection in 
order to collapse what may be isolated fragments of 'free' memory?

3)
In another thread (20/08/2016 19:44) you just wrote:
 > For optimization purposes the best approach is to measure the amount of 
memory *actually* in use before and after any particular operation you perform 
- just as you do with time when profiling for speed (rather than memory 
footprint).

How can we do this? As noted hasMemory is defunct; heapSpace is Mac only; is 
there a method I can use to profile the memory usage?


Many thanks,

Ben

On 19/08/2016 18:42, Mark Waddingham wrote:
> Hi Ben,
>
> When I got to the end of this email I remembered something quite pertinent -
> you mentioned that the limit you were hitting was 2Gb... One thing to check is
> that the install of Windows you are running on cannot be poked to actually
> raise this limit to 3Gb:
>
> https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/
>
>
> Perhaps other's with more insider Windows knowledge can chip in there. It will
> depend on the machine, the version of Windows and probably lots of other
> factors. Given that 'hardware is cheap' compared to rewriting software - if
> the windows install being used currently does not use that 'trick', and can
> be, you'll probably find you get a fair bit of mileage with a bit of computer
> configuration - rather than coding!
>
> Assuming that cannot be done then...
>
> On 2016-08-17 19:52, Ben Rubinstein wrote:
>> Please refresh my memory: is there any way to cause/allow garbage to
>> be collected without ending all script running?
>
> LiveCode doesn't use what is generally referred to as 'garbage collection' as
> it generally frees 'things' up as soon as they are no longer referenced. Now I
> say 'generally' because things fall into two classes:
>
>    1) Values (strings, arrays, data, numbers)
>
>    2) Objects (stacks, cards, buttons etc.)
>
> I'll deal with Objects first:
>
> Objects are deleted as soon as they can be relative to the requirements of the
> engine. We actually changed this mechanism to make it less conservative in
> 6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted objects wouldn't get
> actually freed until the root event loop runs (i.e. when there is no script
> running); now they will generally get freed much closer to when they are
> deleted, especially if they were created 'at the same level or above' where
> the object is deleted. e.g.
>
>    on foo
>      create control bar
>      delete control bar
>    end foo
>
> Here the delete will free the object immediately (as the engine knows that it
> cannot have any internal references to it internally - in particular on the C
> stack).
>
> It sounds like the problem you are having (assuming you aren't creating and
> deleting lots of controls) is to do with values and so...
>
> Values are freed *as soon as* there is no longer any reference to them. In 6.7
> and before that would be whenever a variable is changed (the old value was
> released immediately), or whenever the variable goes out of scope (e.g. locals
> in a handler get released when the handler ends, script locals are released
> when the object is deleted). In 7.0+ this happens as soon as there are no
> variables referencing the same instance of the value. e.g.
>
>   (1) local tVariable1, tVariable2
>   (2) put "foo" & "bar" into tVariable1
>   (3) put tVariable1 into tVariable2
>   (4) put empty into tVariable1
>
> After step (3), tVariable1 and tVariable2 will reference the same value. At
> step (4) the reference tVariable1 holds will be removed, but the value will
> not be deleted (from memory) until tVariable2 changes, or goes out of scope.
> The general mechanism is that values are shared when copied into different
> variables, and are only copied when a variable is mutated. e.g.
>
>   (1) local tVariable1, tVariable2
>   (2) put "foo" & "bar" into tVariable1
>   (3) put tVariable1 into tVariable2
>   (4) put "baz" after tVariable2
>   (5) put empty into tVariable1
>
> Here, at step (4), the value referenced by tVariable2 will be copied (and so
> tVariable1 and tVariable2 will no longer reference the same value), and then
> changed. This means that at step (5) the value previously referenced by
> tVariable1 *will* be freed, because it is not shared with tVariable2
> (obviously - because tVariable2 is no longer the same value!).
>
> The reason I was being so paedagogic in the above is that it opens an
> opportunity for you to potentially reduce the memory footprint of your dataset
> (which sounds like it is what is causing the problem) by doing some
> pre-processing and exploiting the fact that values are not copied until they
> are modified. Of course, I don't know what the structure of the data you are
> processing is - so I'm going to assume you are loading in lots of text files
> and breaking them up into pieces, presumably storing in arrays with the
> individual array elements being numbers and strings.
>
> In this case there are a few interesting things to note about the engine's
> implementation of values...
>
> Array keys are *always* shared (up to case). When you do:
>
>    put tElement into tArray[tKey]
>
> The engine first 'uniques' tKey - this means it ensures that there is only one
> copy of tKey (up to case differences) in memory. So - for every single array
> in memory which contains a key "foo", the value representing the key "foo"
> will not be copied, just referenced from all the arrays. Note that "foo" and
> "Foo", whilst referencing the same value (unless caseSensitive is true), will
> be stored in memory as different values which leads to memory optimization tip 1:
>
>    When constructing arrays from external data, where the case of the key is
> irrelevant use:
>      put X into tArray[toLower(Y)] -- or toUpper (whichever you prefer)
>
> For the values bound to by keys, the story is different. If you do:
>
>    put myString & "1" into tArray["foo"]
>    put myString & "1  into tArray["bar"]
>
> Then the two values of the keys "foo" and "bar" *will be different*. This is
> because they have been constructed differently.
>
> You can optimize this for memory size by using another array to 'index' your
> string values:
>
>    command shareAndStoreKey @xArray, pKey, pValue
>      set the caseSensitive to true -- this is assuming your values are
> sensitive to case
>      if pValue is not among the keys of sValueCache then
>          put pValue into sValueCache[pValue]
>      end if
>      put sValueCache[pValue] into xArray[pKey]
>    end command
>
> After you have processed all your arrays like this, and 'put empty into
> sValueCache' - all string elements in your arrays which are case-sensitively
> the same will share the same value.
>
> Of course, you can play the same trick with arrays - although it is a little
> more tricky, admittedly.
>
> So, anyway, before anyone asks 'why doesn't the engine just do this?'
> (particularly since it does so for array keys) then the answer is performance.
> It is costly to work out which values (which are computed dynamically, or are
> substrings of another string in different places) are actually the same - thus
> you'd end up saving memory but costing performance if the engine uniqued
> *everything*.
>
> So, the next question is probably going to be, 'why does the engine do it for
> array keys then?' and the answer here is because string comparison is slow -
> case-less string comparison more so. When you lookup a key in an associative
> array, it might well take multiple string comparisons to find. By 'uniquing'
> the strings used in array keys, after the engine has processed the lookup
> request it is a constant time operation to do each of these comparisons to
> find the actual element you want. On balance, this means you save time -
> assuming that you are accessing your arrays much more frequently than building
> them - which is usually the case.
>
> Now, all the above I say with caution - the engine may change how it works in
> the future. It might become more 'clever' in some cases, and less 'clever' in
> others; thus you should only go as far to try and optimize your code for
> memory footprint (if you can afford the cost of the pre-processing) if YOU
> REALLY NEED TO.
>
> Clearly, in your (Ben's) case you really do - you are hitting the windows 2Gb
> process limit at the moment, and it sounds like it is a batch process running
> unattended so an initial 'memory miminization process' run on the dataset is
> probably a cost you can afford to pay.
>
> Anyway, without more details of what you are needing to do the above might be
> completely useless...
>
> Just my 2 pence.
>
> Warmest Regards,
>
> Mark.
>





More information about the use-livecode mailing list