How does compiling work?

Richard Gaskin ambassador at fourthworld.com
Thu Sep 8 09:00:08 CDT 2011


Julian Ohrt wrote:

> Is there any documentation how compiling of livecode works internally?
> Is it a compiler which can produce native code (for Windows, Linux,
> etc.)? Are the scripts packaged within the executable together with an
> interpreter and interpreted at run time? Or is it more like a virtual
> machine approach?

Yes, I think it could be said that LiveCode has more in common with a 
virtual machine than almost any other metaphor.

My understanding of the under-the-hood mechanics is very limited, but 
that won't stop me from trying. :)

There are many layers to code execution and the languages which work at 
each level, which could be summarized as:

- CPU instruction set/Object code:  the intructions the processor is 
able to handle on its own, purely binary code; these are very primitive, 
consistent largely of moving stuff from one memory location to another, 
some basic math routines, etc.  Most mortals never write machine code 
directly, relying on assemblers or or compilers to translate their more 
human-readable code into machine instructions.

- Assembler:  a way of working directly with the CPU instruction set, 
but with the advantage of using mnemonic labels for the instructions 
("MOVE" rather than "0111010").  Generally speaking, there is usually a 
one-to-one relationship between Assembler instructions and machine 
instructions.

- C: Designed as a substitute for Assembler, C allows you to execute 
many hundreds or even thousands of machine instructions with relatively 
little code, but it's still somewhat close to the CPU in terms of memory 
management, data types, options for register use, etc.

- C++/C#/Objective C:  a set of libraries and compilers based on C that 
implement object-oriented programming, executing many more instructions 
per line of code and usually involving frameworks that handle many of 
the common tasks an application will perform.

- Scripting: Instructions written in very high-level languages which 
often completely automate things like memory management, type 
conversion, garbage collection, etc., triggering a great many machine 
instructions for each line of code, favoring developer convenience at a 
small cost to efficiency and memory.

At each of these levels, the number of machine instructions triggered by 
a line of code is generally higher, meaning ever more of the work is 
done by the system rather than the programmer.

Much of the LiveCode engine is written in C++ (with some portions in 
straight C, I believe), and the LiveCode scripting language is often 
compiled to an intermediary bytecode, which in the list above might be 
between C++ and Scripting.

Bytecode is very different from true object code, in that object code 
represents the instructions as the CPU itself expects to handle them, 
while bytecode still needs an intermediary mechanism (such as the 
LiveCode engine) to translate it into machine instructions.

Bytecode representations are much closer to those in machine 
instructions than scripts, making the runtime translation of them often 
as simple as jumping from one register to another from a densely packed 
and highly optimized lookup table.

Moreover, bytecode represents a fairly small subset of the instructions 
compiled from your script; in many cases they jump directly into 
compiled object code in the engine, which was written in C++ and 
compiled to machine code using some of the best modern compilers. So in 
effect, as Osterhaut puts it in his seminal paper on scripting (see 
<http://www.stanford.edu/~ouster/cgi-bin/papers/scripting.pdf>), good 
scripting languages are often just a sort of "glue" between true 
machine-compiled routines.  Bytecode makes that glue smaller and more 
efficient.

The scripts you write in LiveCode are what gets saved with the file (at 
least that's what I see when I look at a saved stack file; I can find 
the scripts but if the bytecode gets saved with it it's amazingly small 
because I can't find it at all).

It's my understanding that when a stack is opened, its scripts are 
compiled to bytecode as the stack's object records are unpacked and the 
message path is set up.  This "runtime compilation" involves parsing 
your script and translating that into binary tokens that execute much 
more efficiently.  When executing, this bytecode is translated to direct 
machine instructions on the fly, but as you can see with LiveCode's 
blazing performance, neither the runtime compilation to bytecode nor the 
translation of the bytecode into machine instructions is particularly 
costly.  And by separating the tasks, the more costly parsing of the 
script is done only once, which is one of the reasons why LC outperforms 
fully-interpreted systems (another reason is careful pruning of the 
lookup table used in that parsing and in the subsequent bytecode jumps, 
but that's another story).

In fact, since so much of the actual execution takes place in the 
engine's machine-compiled code, performance for many tasks is on par 
with other systems where you have to wait for a compiler every time you 
change your code. :)

There are exceptions to the general rule that script statements are 
translated to bytecode in advance of execution.  For example, the "do" 
command and the "value" function both require parsing during execution, 
since they work with strings whose values cannot be known in advance, 
and therefore cannot be compiled in advance.

But those tokens also make good examples of LiveCode's efficiency: 
while technically slower than alternative syntax which can be 
precompiled to bytecode, the time it takes the engine to parse those 
expressions and translate them into a form which can be executed is 
usually measured in microseconds, sometimes fractions of microseconds.

Along those lines, compare the time it takes LiveCode to compile a 
script when you push the script editor's "Compile" button to compilation 
times in almost any other system.  With each script compiled to bytecode 
separately, and with its means of doing so being rather well tuned over 
a great many years, it's almost instantaneous - you'll never wait for a 
progress bar when compiling in LiveCode. :)


In summary, LiveCode attempts to find a sweet spot between raw 
performance and developer convenience.  You could write faster-executing 
code in Assembler, but who would want to?  Even using languages like C++ 
will often take orders of magnitude more development time to accomplish 
similar goals.  LiveCode's two-step compilation allows for blazing fast 
performance with nearly unprecedented return on your development time.

IMO, an almost ideal sweet spot indeed.

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  LiveCode Journal blog: http://LiveCodejournal.com/blog.irv



More information about the use-livecode mailing list