Garbage collection (crashing on Windows)
Mark Waddingham
mark at livecode.com
Fri Aug 19 13:42:31 EDT 2016
Hi Ben,
When I got to the end of this email I remembered something quite
pertinent - you mentioned that the limit you were hitting was 2Gb... One
thing to check is that the install of Windows you are running on cannot
be poked to actually raise this limit to 3Gb:
https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/
Perhaps other's with more insider Windows knowledge can chip in there.
It will depend on the machine, the version of Windows and probably lots
of other factors. Given that 'hardware is cheap' compared to rewriting
software - if the windows install being used currently does not use that
'trick', and can be, you'll probably find you get a fair bit of mileage
with a bit of computer configuration - rather than coding!
Assuming that cannot be done then...
On 2016-08-17 19:52, Ben Rubinstein wrote:
> Please refresh my memory: is there any way to cause/allow garbage to
> be collected without ending all script running?
LiveCode doesn't use what is generally referred to as 'garbage
collection' as it generally frees 'things' up as soon as they are no
longer referenced. Now I say 'generally' because things fall into two
classes:
1) Values (strings, arrays, data, numbers)
2) Objects (stacks, cards, buttons etc.)
I'll deal with Objects first:
Objects are deleted as soon as they can be relative to the requirements
of the engine. We actually changed this mechanism to make it less
conservative in 6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted
objects wouldn't get actually freed until the root event loop runs (i.e.
when there is no script running); now they will generally get freed much
closer to when they are deleted, especially if they were created 'at the
same level or above' where the object is deleted. e.g.
on foo
create control bar
delete control bar
end foo
Here the delete will free the object immediately (as the engine knows
that it cannot have any internal references to it internally - in
particular on the C stack).
It sounds like the problem you are having (assuming you aren't creating
and deleting lots of controls) is to do with values and so...
Values are freed *as soon as* there is no longer any reference to them.
In 6.7 and before that would be whenever a variable is changed (the old
value was released immediately), or whenever the variable goes out of
scope (e.g. locals in a handler get released when the handler ends,
script locals are released when the object is deleted). In 7.0+ this
happens as soon as there are no variables referencing the same instance
of the value. e.g.
(1) local tVariable1, tVariable2
(2) put "foo" & "bar" into tVariable1
(3) put tVariable1 into tVariable2
(4) put empty into tVariable1
After step (3), tVariable1 and tVariable2 will reference the same value.
At step (4) the reference tVariable1 holds will be removed, but the
value will not be deleted (from memory) until tVariable2 changes, or
goes out of scope. The general mechanism is that values are shared when
copied into different variables, and are only copied when a variable is
mutated. e.g.
(1) local tVariable1, tVariable2
(2) put "foo" & "bar" into tVariable1
(3) put tVariable1 into tVariable2
(4) put "baz" after tVariable2
(5) put empty into tVariable1
Here, at step (4), the value referenced by tVariable2 will be copied
(and so tVariable1 and tVariable2 will no longer reference the same
value), and then changed. This means that at step (5) the value
previously referenced by tVariable1 *will* be freed, because it is not
shared with tVariable2 (obviously - because tVariable2 is no longer the
same value!).
The reason I was being so paedagogic in the above is that it opens an
opportunity for you to potentially reduce the memory footprint of your
dataset (which sounds like it is what is causing the problem) by doing
some pre-processing and exploiting the fact that values are not copied
until they are modified. Of course, I don't know what the structure of
the data you are processing is - so I'm going to assume you are loading
in lots of text files and breaking them up into pieces, presumably
storing in arrays with the individual array elements being numbers and
strings.
In this case there are a few interesting things to note about the
engine's implementation of values...
Array keys are *always* shared (up to case). When you do:
put tElement into tArray[tKey]
The engine first 'uniques' tKey - this means it ensures that there is
only one copy of tKey (up to case differences) in memory. So - for every
single array in memory which contains a key "foo", the value
representing the key "foo" will not be copied, just referenced from all
the arrays. Note that "foo" and "Foo", whilst referencing the same value
(unless caseSensitive is true), will be stored in memory as different
values which leads to memory optimization tip 1:
When constructing arrays from external data, where the case of the
key is irrelevant use:
put X into tArray[toLower(Y)] -- or toUpper (whichever you prefer)
For the values bound to by keys, the story is different. If you do:
put myString & "1" into tArray["foo"]
put myString & "1 into tArray["bar"]
Then the two values of the keys "foo" and "bar" *will be different*.
This is because they have been constructed differently.
You can optimize this for memory size by using another array to 'index'
your string values:
command shareAndStoreKey @xArray, pKey, pValue
set the caseSensitive to true -- this is assuming your values are
sensitive to case
if pValue is not among the keys of sValueCache then
put pValue into sValueCache[pValue]
end if
put sValueCache[pValue] into xArray[pKey]
end command
After you have processed all your arrays like this, and 'put empty into
sValueCache' - all string elements in your arrays which are
case-sensitively the same will share the same value.
Of course, you can play the same trick with arrays - although it is a
little more tricky, admittedly.
So, anyway, before anyone asks 'why doesn't the engine just do this?'
(particularly since it does so for array keys) then the answer is
performance. It is costly to work out which values (which are computed
dynamically, or are substrings of another string in different places)
are actually the same - thus you'd end up saving memory but costing
performance if the engine uniqued *everything*.
So, the next question is probably going to be, 'why does the engine do
it for array keys then?' and the answer here is because string
comparison is slow - case-less string comparison more so. When you
lookup a key in an associative array, it might well take multiple string
comparisons to find. By 'uniquing' the strings used in array keys, after
the engine has processed the lookup request it is a constant time
operation to do each of these comparisons to find the actual element you
want. On balance, this means you save time - assuming that you are
accessing your arrays much more frequently than building them - which is
usually the case.
Now, all the above I say with caution - the engine may change how it
works in the future. It might become more 'clever' in some cases, and
less 'clever' in others; thus you should only go as far to try and
optimize your code for memory footprint (if you can afford the cost of
the pre-processing) if YOU REALLY NEED TO.
Clearly, in your (Ben's) case you really do - you are hitting the
windows 2Gb process limit at the moment, and it sounds like it is a
batch process running unattended so an initial 'memory miminization
process' run on the dataset is probably a cost you can afford to pay.
Anyway, without more details of what you are needing to do the above
might be completely useless...
Just my 2 pence.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list