Previously in compiler basics:
1. lisp to assembly
2. user-defined functions and variables
3. LLVM
Next in compiler basics:
5. LLVM system calls
6. an x86 upgrade
In this post we'll extend the compiler's LLVM backend to support compiling conditionals such that we can support an implementation of the fibonacci algorithm.
Specifically we're aiming for the following:
$ cat tests/fib.lisp
(def fib (n)
(if (< n 2)
n
(+ (fib (- n 1)) (fib (- n 2)))))
(def main ()
(fib 8))
$ node src/ulisp.js tests/fib.lisp
$ ./build/prog
$ echo $?
21
To do this we'll have to add the <
, -
and
if
built-ins.
All source code is available on Github.
Subtraction
This is the easiest to add since we already support addition. They are
both arithmetic operations that produce an integer. We simply add a
mapping of -
to the LLVM instruction sub
so
our LLVM backend constructor (src/backends/llvm.js
) looks
like this:
...
class Compiler {
constructor() {
this.outBuffer = [];
this.primitiveFunctions = {
def: this.compileDefine.bind(this),
begin: this.compileBegin.bind(this),
'if': this.compileIf.bind(this),
'+': this.compileOp('add'),
'-': this.compileOp('sub'),
...
Less than
The <
builtin is a logical operation. These are handled
differently from arithmetic operations in LLVM IR. A logical operation
looks like this:
%3 = icmp slt i32 %1, %2
This says that we're doing an integer comparison, icmp
,
(with signed less than, slt
) on the i32
integers in variables %1
and %2
.
We can shim this into our existing compileOp
helper like
so:
...
class Compiler {
constructor() {
this.outBuffer = [];
this.primitiveFunctions = {
def: this.compileDefine.bind(this),
begin: this.compileBegin.bind(this),
'if': this.compileIf.bind(this),
'+': this.compileOp('add'),
'-': this.compileOp('sub'),
'<': this.compileOp('icmp slt'),
...
Conditionals
The last part we need to add is support for conditional execution of code at runtime. Assembly-like languages handle this with "jumps" and "labels". Jumping causes execution to continue at the address being jumped to (instead of just the line following the jump instruction). Labels give you a way of naming an address instead of having to calculate it yourself. Our code will look vaguely like this:
%test = icmp slt i32 %n, %1
br i1 %test, label %iftrue, label %iffalse
iftrue:
; do true stuff
iffalse:
; do false stuff
; do next stuff
The br
instruction can jump (or branch) conditionally or
unconditionally. This snippet demonstrates a conditional jump.
But there are a few things wrong with this pseudo-code. First off if
the condition is true, execution will just continue on into the false
section once finished. Second, LLVM IR actually requires all labels to
end with a branch instruction. So we'll add a new label after the true
and false section called ifresult
and jump to it
unconditionally after both.
%test = icmp slt i32 %n, %1
br i1 %test, label %iftrue, label %iffalse
iftrue:
; do true stuff
br label %ifresult
iffalse:
; do false stuff
br label %ifresult
ifresult:
; do next stuff
Scope
One last thing we'll need to do before implementing the code
generation for this is to update our Scope
class to
accept symbol prefixes so we can pass our labels through Scope to make
sure they are unique but still have useful names.
...
class Scope {
...
symbol(prefix = 'sym') {
const nth = Object.keys(this.locals).length + 1;
return this.register(prefix + nth);
}
...
compileIf
Now we can add a primitive function mapping if
to a new
compileIf
helper and implement the helper.
...
class Compiler {
constructor() {
this.outBuffer = [];
this.primitiveFunctions = {
def: this.compileDefine.bind(this),
begin: this.compileBegin.bind(this),
'+': this.compileOp('add'),
'-': this.compileOp('sub'),
'<': this.compileOp('icmp slt'),
'if': this.compileIf.bind(this),
...
compileIf([test, thenBlock, elseBlock], destination, scope) {
const testVariable = scope.symbol();
// Compile expression and branch
this.compileExpression(test, testVariable, scope);
const trueLabel = scope.symbol('iftrue');
const falseLabel = scope.symbol('iffalse');
this.emit(1, `br i1 %${testVariable}, label %${trueLabel}, label %${falseLabel}`);
// Compile true section
this.emit(0, trueLabel + ':');
this.compileExpression(thenBlock, destination, scope);
const endLabel = scope.symbol('ifend');
this.emit(1, 'br label %' + endLabel);
this.emit(0, falseLabel + ':');
// Compile false section
this.compileExpression(elseBlock, destination, scope);
this.emit(1, 'br label %' + endLabel);
// Compile cleanup
this.emit(0, endLabel + ':');
}
...
Note that this code generation sends the destination
variable into both the true and false sections. Let's try it out.
$ node src/ulisp.js tests/fib.lisp
llc: error: llc: build/prog.ll:19:3: error: multiple definition of local value named 'sym5'
%sym5 = add i32 %sym15, %sym16
^
child_process.js:665
throw err;
^
Error: Command failed: llc -o build/prog.s build/prog.ll
llc: error: llc: build/prog.ll:19:3: error: multiple definition of local value named 'sym5'
%sym5 = add i32 %sym15, %sym16
That's annoying. An unfortunate aspect of LLVM's required single-static assignment form is that you cannot reuse variable names within a function even if it is not possible for the variable to be actually reused.
To work around this we need to allocate memory on the stack, store the result in each true/false section in this location, and read from this location afterward to store it in the destination variable.
Stack memory instructions
LLVM IR gives us alloca
to allocate memory on the stack,
store
to store memory at a stack address, and
load
to read the value at a stack address into a
variable. Here's a simple example:
%myvar = add i32 42, 0
%stackaddress = alloca i32, align 4
store i32 %myvar, i32* %stackaddress, align 4
%newvar = load i32, i32* %stackaddress, align 4
Such that newvar
is now 42.
compileIf again
Applying this back to our compileIf
helper gives us:
...
compileIf([test, thenBlock, elseBlock], destination, scope) {
const testVariable = scope.symbol();
const result = scope.symbol('ifresult');
// Space for result
this.emit(1, `%${result} = alloca i32, align 4`);
// Compile expression and branch
this.compileExpression(test, testVariable, scope);
const trueLabel = scope.symbol('iftrue');
const falseLabel = scope.symbol('iffalse');
this.emit(1, `br i1 %${testVariable}, label %${trueLabel}, label %${falseLabel}`);
// Compile true section
this.emit(0, trueLabel + ':');
const tmp1 = scope.symbol();
this.compileExpression(thenBlock, tmp1, scope);
this.emit(1, `store i32 %${tmp1}, i32* %${result}, align 4`);
const endLabel = scope.symbol('ifend');
this.emit(1, 'br label %' + endLabel);
this.emit(0, falseLabel + ':');
// Compile false section
const tmp2 = scope.symbol();
this.compileExpression(elseBlock, tmp2, scope);
this.emit(1, `store i32 %${tmp2}, i32* %${result}, align 4`);
this.emit(1, 'br label %' + endLabel);
// Compile cleanup
this.emit(0, endLabel + ':');
this.emit(1, `%${destination} = load i32, i32* %${result}, align 4`);
}
...
Trying it out
We run our compiler one more time:
$ node src/ulisp.js tests/fib.lisp
$ ./build/prog
$ echo $?
21
And get what we expect!
Next up
- Tail call optimization
- Lists and dynamic memory
- Strings?
- Foreign function calls?
- Self-hosting?
Latest post in the compiler basics series: using LLVM conditionals in compiling a fibonacci program https://t.co/A72yEDQ8sd
— Phil Eaton (@phil_eaton) May 5, 2019