It can be difficult to disassociate the idea that dynamically typed programming languages are tied to byte-code interpreters (e.g. YARV Ruby, CPython, V8, Zend Engine, etc.). But for many languages, a compiled implementation also exists. Cython, Chicken Scheme and SBCL are good examples.
In this post I will briefly describe how I built a compiler for my Scheme implementation using artifacts from the interpreter. In doing this, I learned a simple (not novel) technique for compiling dynamic languages. I'll introduce the Javascript to C++/V8 compiler I am developing using this technique.
BSDScheme
For the past year I've developed a Scheme implementation, BSDScheme. I started with an AST-interpreter (as opposed to a byte-code compiler and VM). A more detailed blog post on the first few steps writing BSDScheme can be found here.
As I built up support for the various objects and operations in the
language, I had a sizeable base of D code for the BSDScheme
runtime. This included an object representation for primitive types
(and support for converting to and from types in D) as well as basic
Scheme operations
(+
, -
, car
, cdr
,
etc.).
When the time came to implement a compiler backend, I only needed to do codegen since the parser already existed. Furthermore, the fundamental bits had already been written: object representation and much of the standard library. So I wrote the simplest compiler I could think of by targeting D and the objects/functions I had already written to support the interpreter.
Take, for example, the equals
function
in the standard library:
Value equals(Value arguments, void** rest) {
auto tuple = valueToList(arguments);
auto left = tuple[0];
auto right = car(tuple[1]);
bool b;
switch (tagOfValue(left)) {
case ValueTag.Integer:
b = valueIsInteger(right) && valueToInteger(left) == valueToInteger(right);
break;
case ValueTag.Char:
b = valueIsChar(right) && valueToChar(left) == valueToChar(right);
break;
case ValueTag.String:
b = valueIsString(right) && valueToString(left) == valueToString(right);
break;
case ValueTag.Symbol:
b = valueIsSymbol(right) && valueToSymbol(left) == valueToSymbol(right);
break;
case ValueTag.Function:
b = valueIsFunction(right) && valueToFunction(left)[1] == valueToFunction(right)[1];
break;
case ValueTag.Bool:
b = valueIsBool(right) && valueToBool(left) == valueToBool(right);
break;
default:
b = false;
}
return makeBoolValue(b);
}
So long as my compiler generated code that used the Value
object to represent Scheme data, I already had an equals
function and large swaths of a Scheme standard library that I could
share between the compiler and interpreter.
Ultimately I only needed to implement a few control structures to support compiling a large subset of what I supported in the interpreter. The key aspects here include: function definitions (in D), function calls (D function calls), if/else (if/else in D) and so on.
To give a concrete example of a whole program compiled, this Scheme program:
(define (exp base pow)
(if (= pow 0)
1
(* base (exp base (- pow 1)))))
(define (main)
(display (exp 2 16))
(newline))
when run through the BSDScheme compiler would become:
import std.stdio;
import lex;
import common;
import parse;
import utility;
import value;
import buffer;
Value exp(Value arguments, void** ctx) {
Value[] tmp = listToVector(arguments);
Value base = tmp[0];
Value pow = tmp[1];
Value equals_result = equals(vectorToList([pow, makeIntegerValue(0)]), null);
Value if_result;
if (truthy(equals_result)) {
makeIntegerValue(1);
if_result = makeIntegerValue(1);
} else {
Value minus_result = minus(vectorToList([pow, makeIntegerValue(1)]), null);
Value exp_result = exp(vectorToList([base, minus_result]), null);
Value times_result = times(vectorToList([base, exp_result]), null);
if_result = times_result;
};
return if_result;
}
Value BSDScheme_main(Value arguments, void** ctx) {
Value exp_result = exp(vectorToList([makeIntegerValue(2), makeIntegerValue(16)]), null);
Value display_result = display(vectorToList([exp_result]), null);
Value newline_result = newline(vectorToList([]), null);
return newline_result;
}
void main() { BSDScheme_main(nilValue, cast(void**)0); }
Where every imported function had already been written for the interpreter. I had only to translate a few lines to D and import/call these existing libraries. Now I had a small binary of compiled Scheme.
It was at this point I realized I was using the same technique used by Cython to compile Python code.
...the Cython project has approached this problem by means of a source code compiler that translates Python code to equivalent C code. This code is executed within the CPython runtime environment, but at the speed of compiled C and with the ability to call directly into C libraries. http://docs.cython.org/en/latest/src/quickstart/overview.html
jsc
I played with many PL-research-y languages over the years and wanted to do build something a little more practical. So I took what I learned writing the BSDScheme compiler and decided to write a Javascript compiler. Specifically, it would target the easiest backend I could imagine: C++ using the V8 C++ library and generating a Node addon.
There already existed well-trodden guides/means of writing Node addons
in C++ so I spent some time trying to hand-compile simple Javascript
programs to C++ and V8. A string in Javascript would become a
v8::String
type in C++. A number in Javascript would become
v8::Number
in C++ and so forth.
I decided to write this compiler in Rust given its roots in (and my familiarity with) ML and Python. I found a Javascript parser by Dave Herman and after a few lazy weeks finally got a "Hello world!" program compiling. Getting my first program to compile has by far been the hardest part of building jsc.
Let's look at a concrete example of a recursive fibonacci program (example/recursion.js in the repo):
function fib(i) {
if (i <= 1) {
return i;
}
return fib(i - 1) + fib(i - 2);
}
function main() {
console.log(fib(20));
}
Let's add a call to main()
at the end and time this with
Node to get a baseline:
$ time node example/recursion.js
6765
node example/recursion.js 0.06s user 0.02s system 97% cpu 0.083 total
Now let's install jsc to compare. We'll need Rust, Cargo, Node and Node-GYP.
$ git clone https:/github.com/eatonphil/jsc
$ cd jsc
$ make && make install
$ jsc example/recursion.js
jsc produces a Javascript entrypoint that imports our addon (build/recursion.js):
require("./build/Release/recursion").jsc_main();
And it produces a C++ file that represents the entire program (build/recursion.cc):
#include <string>
#include <node.h>
using v8::Boolean;
using v8::Context;
using v8::Exception;
using v8::Function;
using v8::FunctionTemplate;
using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Null;
using v8::Number;
using v8::Object;
using v8::String;
using v8::False;
using v8::True;
using v8::Value;
void fib(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
Local<Value> i = args[0];
tail_recurse_1:
Local<Context> ctx_2 = isolate->GetCurrentContext();
Local<Object> global_3 = ctx_2->Global();
Local<Function> Boolean_4 = Local<Function>::Cast(global_3->Get(String::NewFromUtf8(isolate, "Boolean")));
String::Utf8Value utf8value_tmp_5(i);
std::string string_tmp_6(*utf8value_tmp_5);
String::Utf8Value utf8value_tmp_7(Number::New(isolate, 1));
std::string string_tmp_8(*utf8value_tmp_7);
Local<Value> argv_9[] = { (i->IsNumber() || Number::New(isolate, 1)->IsNumber()) ? Boolean::New(isolate, i->ToNumber(isolate)->Value() <= Number::New(isolate, 1)->ToNumber(isolate)->Value()) : ((i->IsString() || Number::New(isolate, 1)->IsString()) ? Boolean::New(isolate, string_tmp_6 <= string_tmp_8) : (False(isolate))) };
Local<Value> result_10 = Boolean_4->Call(Null(isolate), 1, argv_9);
if (result_10->ToBoolean()->Value()) {
args.GetReturnValue().Set(i);
return;
return;
}
Local<Value> arg_11 = (i->IsNumber() || Number::New(isolate, 1)->IsNumber()) ? (Number::New(isolate, i->ToNumber(isolate)->Value() - Number::New(isolate, 1)->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate));
Local<FunctionTemplate> ftpl_13 = FunctionTemplate::New(isolate, fib);
Local<Function> fn_12 = ftpl_13->GetFunction();
fn_12->SetName(String::NewFromUtf8(isolate, "fib"));
Local<Value> argv_14[] = { arg_11 };
Local<Value> result_15 = fn_12->Call(Null(isolate), 1, argv_14);
Local<Value> arg_16 = (i->IsNumber() || Number::New(isolate, 2)->IsNumber()) ? (Number::New(isolate, i->ToNumber(isolate)->Value() - Number::New(isolate, 2)->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate));
Local<FunctionTemplate> ftpl_18 = FunctionTemplate::New(isolate, fib);
Local<Function> fn_17 = ftpl_18->GetFunction();
fn_17->SetName(String::NewFromUtf8(isolate, "fib"));
Local<Value> argv_19[] = { arg_16 };
Local<Value> result_20 = fn_17->Call(Null(isolate), 1, argv_19);
args.GetReturnValue().Set((result_15->IsString() || result_20->IsString()) ? Local<Value>::Cast(String::Concat(result_15->ToString(), result_20->ToString())) : Local<Value>::Cast((result_15->IsNumber() || result_20->IsNumber()) ? (Number::New(isolate, result_15->ToNumber(isolate)->Value() + result_20->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate))));
return;
}
void jsc_main(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
tail_recurse_21:
Local<Value> arg_22 = Number::New(isolate, 20);
Local<FunctionTemplate> ftpl_24 = FunctionTemplate::New(isolate, fib);
Local<Function> fn_23 = ftpl_24->GetFunction();
fn_23->SetName(String::NewFromUtf8(isolate, "fib"));
Local<Value> argv_25[] = { arg_22 };
Local<Value> result_26 = fn_23->Call(Null(isolate), 1, argv_25);
Local<Value> arg_27 = result_26;
Local<Function> fn_28 = Local<Function>::Cast(Local<Object>::Cast(isolate->GetCurrentContext()->Global()->Get(String::NewFromUtf8(isolate, "console")))->Get(String::NewFromUtf8(isolate, "log")));
Local<Value> argv_29[] = { arg_27 };
Local<Value> result_30 = fn_28->Call(Null(isolate), 1, argv_29);
result_30;
}
void Init(Local<Object> exports) {
NODE_SET_METHOD(exports, "jsc_main", jsc_main);
}
NODE_MODULE(NODE_GYP_MODULE_NAME, Init)
Let's time this version:
$ time node build/recursion.js
6765
node build/recursion.js 0.16s user 0.03s system 107% cpu 0.175 total
jsc, over twice as slow, is already falling behind Node. :)
As I incremented the number passed to my fibonacci function the compiled program time to completion get exponentially worse. Node stayed the same. I decided to try tail-call optimization to decrease the performance distance between Node and jsc.
I implemented tail-call optimization for the interpreter in BSDScheme
by putting all functions in a loop that would break if tail-call
elimination was not to happen. It took me a week to implement this and
I never put it in place for the compiler. This time around I was able
to add basic tail call elimination to jsc in two hours. It is done
by label
s and goto
s instead of a tail call
when applicable.
Here is a tail-call optimized version of the same program (example/tco.js):
function fib(n, a, b) {
if (n == 0) {
return a;
}
if (n == 1) {
return b;
}
return fib(n - 1, b, a + b);
}
function main() {
console.log(fib(50, 0, 1));
}
We add a call to main()
again for Node and time it:
$ time node example/tco.js
12586269025
node example/tco.js 0.06s user 0.02s system 96% cpu 0.080 total
And compile it with jsc and time it:
$ jsc example/tco.js
$ time node build/tco.js
12586269025
node build/tco.js 0.07s user 0.02s system 95% cpu 0.087 total
Well that's not bad at all. :)
Next steps with jsc
jsc has very limited support for... everything. Today I added almost
all primitive numeric operations + equality/inequality operations +
unit tests. jsc does not yet support nested functions, callbacks, or
closures. It supports while
loops but not
yet for
loops. And I'm not sure if it supports else
if
. It does not support arrays or objects let alone
constructors and prototypes. Adding support for these is low-hanging
fruit.
After the low-hanging fruit, more interesting projects for jsc include:
- generating C++ with embedded V8 rather than only targeting Node addons
- type inference or type hinting for generating unboxed functions a la Cython and SBCL