How does compiling work?
Richard Gaskin
ambassador at fourthworld.com
Thu Sep 8 10:00:08 EDT 2011
Julian Ohrt wrote:
> Is there any documentation how compiling of livecode works internally?
> Is it a compiler which can produce native code (for Windows, Linux,
> etc.)? Are the scripts packaged within the executable together with an
> interpreter and interpreted at run time? Or is it more like a virtual
> machine approach?
Yes, I think it could be said that LiveCode has more in common with a
virtual machine than almost any other metaphor.
My understanding of the under-the-hood mechanics is very limited, but
that won't stop me from trying. :)
There are many layers to code execution and the languages which work at
each level, which could be summarized as:
- CPU instruction set/Object code: the intructions the processor is
able to handle on its own, purely binary code; these are very primitive,
consistent largely of moving stuff from one memory location to another,
some basic math routines, etc. Most mortals never write machine code
directly, relying on assemblers or or compilers to translate their more
human-readable code into machine instructions.
- Assembler: a way of working directly with the CPU instruction set,
but with the advantage of using mnemonic labels for the instructions
("MOVE" rather than "0111010"). Generally speaking, there is usually a
one-to-one relationship between Assembler instructions and machine
instructions.
- C: Designed as a substitute for Assembler, C allows you to execute
many hundreds or even thousands of machine instructions with relatively
little code, but it's still somewhat close to the CPU in terms of memory
management, data types, options for register use, etc.
- C++/C#/Objective C: a set of libraries and compilers based on C that
implement object-oriented programming, executing many more instructions
per line of code and usually involving frameworks that handle many of
the common tasks an application will perform.
- Scripting: Instructions written in very high-level languages which
often completely automate things like memory management, type
conversion, garbage collection, etc., triggering a great many machine
instructions for each line of code, favoring developer convenience at a
small cost to efficiency and memory.
At each of these levels, the number of machine instructions triggered by
a line of code is generally higher, meaning ever more of the work is
done by the system rather than the programmer.
Much of the LiveCode engine is written in C++ (with some portions in
straight C, I believe), and the LiveCode scripting language is often
compiled to an intermediary bytecode, which in the list above might be
between C++ and Scripting.
Bytecode is very different from true object code, in that object code
represents the instructions as the CPU itself expects to handle them,
while bytecode still needs an intermediary mechanism (such as the
LiveCode engine) to translate it into machine instructions.
Bytecode representations are much closer to those in machine
instructions than scripts, making the runtime translation of them often
as simple as jumping from one register to another from a densely packed
and highly optimized lookup table.
Moreover, bytecode represents a fairly small subset of the instructions
compiled from your script; in many cases they jump directly into
compiled object code in the engine, which was written in C++ and
compiled to machine code using some of the best modern compilers. So in
effect, as Osterhaut puts it in his seminal paper on scripting (see
<http://www.stanford.edu/~ouster/cgi-bin/papers/scripting.pdf>), good
scripting languages are often just a sort of "glue" between true
machine-compiled routines. Bytecode makes that glue smaller and more
efficient.
The scripts you write in LiveCode are what gets saved with the file (at
least that's what I see when I look at a saved stack file; I can find
the scripts but if the bytecode gets saved with it it's amazingly small
because I can't find it at all).
It's my understanding that when a stack is opened, its scripts are
compiled to bytecode as the stack's object records are unpacked and the
message path is set up. This "runtime compilation" involves parsing
your script and translating that into binary tokens that execute much
more efficiently. When executing, this bytecode is translated to direct
machine instructions on the fly, but as you can see with LiveCode's
blazing performance, neither the runtime compilation to bytecode nor the
translation of the bytecode into machine instructions is particularly
costly. And by separating the tasks, the more costly parsing of the
script is done only once, which is one of the reasons why LC outperforms
fully-interpreted systems (another reason is careful pruning of the
lookup table used in that parsing and in the subsequent bytecode jumps,
but that's another story).
In fact, since so much of the actual execution takes place in the
engine's machine-compiled code, performance for many tasks is on par
with other systems where you have to wait for a compiler every time you
change your code. :)
There are exceptions to the general rule that script statements are
translated to bytecode in advance of execution. For example, the "do"
command and the "value" function both require parsing during execution,
since they work with strings whose values cannot be known in advance,
and therefore cannot be compiled in advance.
But those tokens also make good examples of LiveCode's efficiency:
while technically slower than alternative syntax which can be
precompiled to bytecode, the time it takes the engine to parse those
expressions and translate them into a form which can be executed is
usually measured in microseconds, sometimes fractions of microseconds.
Along those lines, compare the time it takes LiveCode to compile a
script when you push the script editor's "Compile" button to compilation
times in almost any other system. With each script compiled to bytecode
separately, and with its means of doing so being rather well tuned over
a great many years, it's almost instantaneous - you'll never wait for a
progress bar when compiling in LiveCode. :)
In summary, LiveCode attempts to find a sweet spot between raw
performance and developer convenience. You could write faster-executing
code in Assembler, but who would want to? Even using languages like C++
will often take orders of magnitude more development time to accomplish
similar goals. LiveCode's two-step compilation allows for blazing fast
performance with nearly unprecedented return on your development time.
IMO, an almost ideal sweet spot indeed.
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
LiveCode Journal blog: http://LiveCodejournal.com/blog.irv
More information about the use-livecode
mailing list