How does compiling work?

Devin Asay devin_asay at byu.edu
Thu Sep 8 13:30:38 EDT 2011


Great summary, Richard! This is going into my teaching notes file.

Devin

On Sep 8, 2011, at 8:00 AM, Richard Gaskin wrote:

> Julian Ohrt wrote:
> 
>> Is there any documentation how compiling of livecode works internally?
>> Is it a compiler which can produce native code (for Windows, Linux,
>> etc.)? Are the scripts packaged within the executable together with an
>> interpreter and interpreted at run time? Or is it more like a virtual
>> machine approach?
> 
> Yes, I think it could be said that LiveCode has more in common with a virtual machine than almost any other metaphor.
> 
> My understanding of the under-the-hood mechanics is very limited, but that won't stop me from trying. :)
> 
> There are many layers to code execution and the languages which work at each level, which could be summarized as:
> 
> - CPU instruction set/Object code:  the intructions the processor is able to handle on its own, purely binary code; these are very primitive, consistent largely of moving stuff from one memory location to another, some basic math routines, etc.  Most mortals never write machine code directly, relying on assemblers or or compilers to translate their more human-readable code into machine instructions.
> 
> - Assembler:  a way of working directly with the CPU instruction set, but with the advantage of using mnemonic labels for the instructions ("MOVE" rather than "0111010").  Generally speaking, there is usually a one-to-one relationship between Assembler instructions and machine instructions.
> 
> - C: Designed as a substitute for Assembler, C allows you to execute many hundreds or even thousands of machine instructions with relatively little code, but it's still somewhat close to the CPU in terms of memory management, data types, options for register use, etc.
> 
> - C++/C#/Objective C:  a set of libraries and compilers based on C that implement object-oriented programming, executing many more instructions per line of code and usually involving frameworks that handle many of the common tasks an application will perform.
> 
> - Scripting: Instructions written in very high-level languages which often completely automate things like memory management, type conversion, garbage collection, etc., triggering a great many machine instructions for each line of code, favoring developer convenience at a small cost to efficiency and memory.
> 
> At each of these levels, the number of machine instructions triggered by a line of code is generally higher, meaning ever more of the work is done by the system rather than the programmer.
> 
> Much of the LiveCode engine is written in C++ (with some portions in straight C, I believe), and the LiveCode scripting language is often compiled to an intermediary bytecode, which in the list above might be between C++ and Scripting.
> 
> Bytecode is very different from true object code, in that object code represents the instructions as the CPU itself expects to handle them, while bytecode still needs an intermediary mechanism (such as the LiveCode engine) to translate it into machine instructions.
> 
> Bytecode representations are much closer to those in machine instructions than scripts, making the runtime translation of them often as simple as jumping from one register to another from a densely packed and highly optimized lookup table.
> 
> Moreover, bytecode represents a fairly small subset of the instructions compiled from your script; in many cases they jump directly into compiled object code in the engine, which was written in C++ and compiled to machine code using some of the best modern compilers. So in effect, as Osterhaut puts it in his seminal paper on scripting (see <http://www.stanford.edu/~ouster/cgi-bin/papers/scripting.pdf>), good scripting languages are often just a sort of "glue" between true machine-compiled routines.  Bytecode makes that glue smaller and more efficient.
> 
> The scripts you write in LiveCode are what gets saved with the file (at least that's what I see when I look at a saved stack file; I can find the scripts but if the bytecode gets saved with it it's amazingly small because I can't find it at all).
> 
> It's my understanding that when a stack is opened, its scripts are compiled to bytecode as the stack's object records are unpacked and the message path is set up.  This "runtime compilation" involves parsing your script and translating that into binary tokens that execute much more efficiently.  When executing, this bytecode is translated to direct machine instructions on the fly, but as you can see with LiveCode's blazing performance, neither the runtime compilation to bytecode nor the translation of the bytecode into machine instructions is particularly costly.  And by separating the tasks, the more costly parsing of the script is done only once, which is one of the reasons why LC outperforms fully-interpreted systems (another reason is careful pruning of the lookup table used in that parsing and in the subsequent bytecode jumps, but that's another story).
> 
> In fact, since so much of the actual execution takes place in the engine's machine-compiled code, performance for many tasks is on par with other systems where you have to wait for a compiler every time you change your code. :)
> 
> There are exceptions to the general rule that script statements are translated to bytecode in advance of execution.  For example, the "do" command and the "value" function both require parsing during execution, since they work with strings whose values cannot be known in advance, and therefore cannot be compiled in advance.
> 
> But those tokens also make good examples of LiveCode's efficiency: while technically slower than alternative syntax which can be precompiled to bytecode, the time it takes the engine to parse those expressions and translate them into a form which can be executed is usually measured in microseconds, sometimes fractions of microseconds.
> 
> Along those lines, compare the time it takes LiveCode to compile a script when you push the script editor's "Compile" button to compilation times in almost any other system.  With each script compiled to bytecode separately, and with its means of doing so being rather well tuned over a great many years, it's almost instantaneous - you'll never wait for a progress bar when compiling in LiveCode. :)
> 
> 
> In summary, LiveCode attempts to find a sweet spot between raw performance and developer convenience.  You could write faster-executing code in Assembler, but who would want to?  Even using languages like C++ will often take orders of magnitude more development time to accomplish similar goals.  LiveCode's two-step compilation allows for blazing fast performance with nearly unprecedented return on your development time.
> 
> IMO, an almost ideal sweet spot indeed.
> 
> --
> Richard Gaskin
> Fourth World
> LiveCode training and consulting: http://www.fourthworld.com
> Webzine for LiveCode developers: http://www.LiveCodeJournal.com
> LiveCode Journal blog: http://LiveCodejournal.com/blog.irv
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University







More information about the use-livecode mailing list