Garbage collection (crashing on Windows)

Mark Waddingham mark at livecode.com
Fri Aug 19 13:42:31 EDT 2016


Hi Ben,

When I got to the end of this email I remembered something quite 
pertinent - you mentioned that the limit you were hitting was 2Gb... One 
thing to check is that the install of Windows you are running on cannot 
be poked to actually raise this limit to 3Gb:

https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/

Perhaps other's with more insider Windows knowledge can chip in there. 
It will depend on the machine, the version of Windows and probably lots 
of other factors. Given that 'hardware is cheap' compared to rewriting 
software - if the windows install being used currently does not use that 
'trick', and can be, you'll probably find you get a fair bit of mileage 
with a bit of computer configuration - rather than coding!

Assuming that cannot be done then...

On 2016-08-17 19:52, Ben Rubinstein wrote:
> Please refresh my memory: is there any way to cause/allow garbage to
> be collected without ending all script running?

LiveCode doesn't use what is generally referred to as 'garbage 
collection' as it generally frees 'things' up as soon as they are no 
longer referenced. Now I say 'generally' because things fall into two 
classes:

    1) Values (strings, arrays, data, numbers)

    2) Objects (stacks, cards, buttons etc.)

I'll deal with Objects first:

Objects are deleted as soon as they can be relative to the requirements 
of the engine. We actually changed this mechanism to make it less 
conservative in 6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted 
objects wouldn't get actually freed until the root event loop runs (i.e. 
when there is no script running); now they will generally get freed much 
closer to when they are deleted, especially if they were created 'at the 
same level or above' where the object is deleted. e.g.

    on foo
      create control bar
      delete control bar
    end foo

Here the delete will free the object immediately (as the engine knows 
that it cannot have any internal references to it internally - in 
particular on the C stack).

It sounds like the problem you are having (assuming you aren't creating 
and deleting lots of controls) is to do with values and so...

Values are freed *as soon as* there is no longer any reference to them. 
In 6.7 and before that would be whenever a variable is changed (the old 
value was released immediately), or whenever the variable goes out of 
scope (e.g. locals in a handler get released when the handler ends, 
script locals are released when the object is deleted). In 7.0+ this 
happens as soon as there are no variables referencing the same instance 
of the value. e.g.

   (1) local tVariable1, tVariable2
   (2) put "foo" & "bar" into tVariable1
   (3) put tVariable1 into tVariable2
   (4) put empty into tVariable1

After step (3), tVariable1 and tVariable2 will reference the same value. 
At step (4) the reference tVariable1 holds will be removed, but the 
value will not be deleted (from memory) until tVariable2 changes, or 
goes out of scope. The general mechanism is that values are shared when 
copied into different variables, and are only copied when a variable is 
mutated. e.g.

   (1) local tVariable1, tVariable2
   (2) put "foo" & "bar" into tVariable1
   (3) put tVariable1 into tVariable2
   (4) put "baz" after tVariable2
   (5) put empty into tVariable1

Here, at step (4), the value referenced by tVariable2 will be copied 
(and so tVariable1 and tVariable2 will no longer reference the same 
value), and then changed. This means that at step (5) the value 
previously referenced by tVariable1 *will* be freed, because it is not 
shared with tVariable2 (obviously - because tVariable2 is no longer the 
same value!).

The reason I was being so paedagogic in the above is that it opens an 
opportunity for you to potentially reduce the memory footprint of your 
dataset (which sounds like it is what is causing the problem) by doing 
some pre-processing and exploiting the fact that values are not copied 
until they are modified. Of course, I don't know what the structure of 
the data you are processing is - so I'm going to assume you are loading 
in lots of text files and breaking them up into pieces, presumably 
storing in arrays with the individual array elements being numbers and 
strings.

In this case there are a few interesting things to note about the 
engine's implementation of values...

Array keys are *always* shared (up to case). When you do:

    put tElement into tArray[tKey]

The engine first 'uniques' tKey - this means it ensures that there is 
only one copy of tKey (up to case differences) in memory. So - for every 
single array in memory which contains a key "foo", the value 
representing the key "foo" will not be copied, just referenced from all 
the arrays. Note that "foo" and "Foo", whilst referencing the same value 
(unless caseSensitive is true), will be stored in memory as different 
values which leads to memory optimization tip 1:

    When constructing arrays from external data, where the case of the 
key is irrelevant use:
      put X into tArray[toLower(Y)] -- or toUpper (whichever you prefer)

For the values bound to by keys, the story is different. If you do:

    put myString & "1" into tArray["foo"]
    put myString & "1  into tArray["bar"]

Then the two values of the keys "foo" and "bar" *will be different*. 
This is because they have been constructed differently.

You can optimize this for memory size by using another array to 'index' 
your string values:

    command shareAndStoreKey @xArray, pKey, pValue
      set the caseSensitive to true -- this is assuming your values are 
sensitive to case
      if pValue is not among the keys of sValueCache then
          put pValue into sValueCache[pValue]
      end if
      put sValueCache[pValue] into xArray[pKey]
    end command

After you have processed all your arrays like this, and 'put empty into 
sValueCache' - all string elements in your arrays which are 
case-sensitively the same will share the same value.

Of course, you can play the same trick with arrays - although it is a 
little more tricky, admittedly.

So, anyway, before anyone asks 'why doesn't the engine just do this?' 
(particularly since it does so for array keys) then the answer is 
performance. It is costly to work out which values (which are computed 
dynamically, or are substrings of another string in different places) 
are actually the same - thus you'd end up saving memory but costing 
performance if the engine uniqued *everything*.

So, the next question is probably going to be, 'why does the engine do 
it for array keys then?' and the answer here is because string 
comparison is slow - case-less string comparison more so. When you 
lookup a key in an associative array, it might well take multiple string 
comparisons to find. By 'uniquing' the strings used in array keys, after 
the engine has processed the lookup request it is a constant time 
operation to do each of these comparisons to find the actual element you 
want. On balance, this means you save time - assuming that you are 
accessing your arrays much more frequently than building them - which is 
usually the case.

Now, all the above I say with caution - the engine may change how it 
works in the future. It might become more 'clever' in some cases, and 
less 'clever' in others; thus you should only go as far to try and 
optimize your code for memory footprint (if you can afford the cost of 
the pre-processing) if YOU REALLY NEED TO.

Clearly, in your (Ben's) case you really do - you are hitting the 
windows 2Gb process limit at the moment, and it sounds like it is a 
batch process running unattended so an initial 'memory miminization 
process' run on the dataset is probably a cost you can afford to pay.

Anyway, without more details of what you are needing to do the above 
might be completely useless...

Just my 2 pence.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list