November 26, 2020

Emulating linux/AMD64 userland: interpreting an ELF binary

In this post we'll stumble toward a working emulator for a barebones C program compiled for linux/AMD64. The approach will be slightly more so based on observation than by following a spec; a great way to quickly become familiar with a topic, and a bad way to guarantee correctness.

The goal:

$ cat tests/simple.c
int main() {
  return 4;
}
$ gcc tests/simple.c
$ go build -o main
$ ./main a.out && echo $?
4

This may look ridiculously simple but when you don't know how to deal with a binary or how instructions are encoded, it will take a few hours to write an emulator that can generally handle this program!

Code for this project is available on Github.

Background

AMD64, x86_64 or x64 are different names for AMD's widely adopted 64-bit extension to Intel's x86 instruction set (i.e. the encoding and semantics of x86 binaries). AMD64 is a superset of x86 (introducing 64-bit registers and operations) and thus backwards compatible with x86 programs.

A year and a half ago I first got into emulation with an AMD64 emulator in JavaScript. The JavaScript emulator interpreted the textual representation of AMD64 programs (e.g. MOV RBP, RSP, Intel's assembly syntax). A C program had to be compiled with -S to produce an assembly file that the JavaScript emulator could read (i.e. gcc -S tests/simple.c) This was a great way to get started with emulation by ignoring the complexity of encoded instructions and executable formats.

If we dig into the binary file produced by gcc on Linux we learn that it is an ELF file.

$ gcc test/simple.c
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d0b5c742b9fbcbcca4dfa9438a8437a8478a51bb, for GNU/Linux 3.2.0, not stripped

ELF is responsible for surrounding the actual binary-encoded program instructions with metadata on exported and imported C identifiers and program entrypoint. But for simple programs like this initial emulator, we can ignore export/imports. We'll only use the ELF metadata to find out where the instructions for our main function start.

Where is main?

If we use an ELF reader+disassembler on the binary generated by gcc and search for main we can find its address.

$ objdump -d a.out | grep -A10 '<main>'
0000000000401106 <main>:
  401106:       55                      push   %rbp
  401107:       48 89 e5                mov    %rsp,%rbp
  40110a:       b8 fe 00 00 00          mov    $0xfe,%eax
  40110f:       5d                      pop    %rbp
  401110:       c3                      retq
  401111:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  401118:       00 00 00
  40111b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000401120 <__libc_csu_init>:

This means that the function, main, starts at address 0x401106 in memory. Furthermore, this implies that the binary must be loaded into CPU memory such that the CPU can jump here to execute our program.

In truth, main is not this program's entrypoint. If we run objdump -x a.out we can see that the ELF entrypoint is 0x401020.

$ objdump -x a.out
a.out:     file format elf64-x86-64
a.out
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000401020

Program Header:
    PHDR off    0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
             filesz 0x00000000000002d8 memsz 0x00000000000002d8 flags r--

This is because the actual entrypoint gcc sets up is a function called _start. The libc prelude beginning with _start is responsible for initializing the libc runtime, calling our main function and executing the exit syscall with the return value of main.

objdump -d a.out | grep -A10 '<_start>'
0000000000401020 <_start>:
  401020:       f3 0f 1e fa             endbr64
  401024:       31 ed                   xor    %ebp,%ebp
  401026:       49 89 d1                mov    %rdx,%r9
  401029:       5e                      pop    %rsi
  40102a:       48 89 e2                mov    %rsp,%rdx
  40102d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  401031:       50                      push   %rax
  401032:       54                      push   %rsp
  401033:       49 c7 c0 90 11 40 00    mov    $0x401190,%r8
  40103a:       48 c7 c1 20 11 40 00    mov    $0x401120,%rcx

But because all this libc initialization is relatively complicated we're just going to skip the actual ELF entrypoint for now. Our emulator will locate main, load the binary into memory, jump to the start of main, and set the exit code of the emulator to the result of main.

As you can see, this ELF binary has its own hard-coded view of where it will be in memory. What if our CPU were to run multiple process at once? We might give each process its own virtual memory space and map back to a real memory space so each process (and by extension, compilers) doesn't have to think about how they fit into memory relative to other processes.

The last question to figure out is where to load the ELF binary into emulator memory so that addresses in memory are where the program expects.

As it turns out, there is a piece of metadata called section headers that contain an address and a offset from the start of the ELF file. By subtracting this we can get the location the file expects to be in memory.

$ objdump -x a.out
a.out:     file format elf64-x86-64
a.out
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000401020

Program Header:
    PHDR off    0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
             filesz 0x00000000000002d8 memsz 0x00000000000002d8 flags r--

That is: 0x400040 (vaddr) - 0x40 (off) = 0x400000. Judging from a Google search this seems to be a pretty common address where ELF binaries are loaded into memory.

ELF and Go

Binary file formats tend to be a pain to work with because, to enable greater compression, everything ends up being a pointer to something else. So you end up jumping all around the file just to stitch information back together.

So the one third-party-ish library we'll use is Go's builtin debug/elf package. With this library we can load an ELF binary and iterate over symbols and sections to discover the location of main and the start address for the binary in memory.

Editing in main.go:

package main

import (
    "bytes"
    "debug/elf"
    "fmt"
    "io/ioutil"
    "log"
    "os"
)

type process struct {
    startAddress uint64
    entryPoint   uint64
    bin          []byte
}

func readELF(filename, entrySymbol string) (*process, error) {
    bin, err := ioutil.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    elffile, err := elf.NewFile(bytes.NewReader(bin))
    if err != nil {
        return nil, err
    }

    symbols, err := elffile.Symbols()
    if err != nil {
        return nil, err
    }

    var entryPoint uint64
    for _, sym := range symbols {
        if sym.Name == entrySymbol && elf.STT_FUNC == elf.ST_TYPE(sym.Info) && elf.STB_GLOBAL == elf.ST_BIND(sym.Info) {
            entryPoint = sym.Value
        }
    }

    if entryPoint == 0 {
        return nil, fmt.Errorf("Could not find entrypoint symbol: %s", entrySymbol)
    }

    var startAddress uint64
    for _, sec := range elffile.Sections {
        if sec.Type != elf.SHT_NULL {
            startAddress = sec.Addr - sec.Offset
            break
        }
    }

    if startAddress == 0 {
        return nil, fmt.Errorf("Could not determine start address")
    }

    return &process{
        startAddress: startAddress,
        entryPoint:   entryPoint,
        bin:          bin,
    }, nil
}

func main() {
    if len(os.Args) < 2 {
        log.Fatal("Binary not provided")
    }

    proc, err := readELF(os.Args[1], "main")
    if err != nil {
        panic(err)
    }

    fmt.Printf("Start: 0x%x\nEntry: 0x%x\n", proc.startAddress, proc.entryPoint)
}

We can test on a basic compiled C program:

$ cat tests/simple.c
int main() {
  return 4;
}
$ gcc tests/simple.c
$ go build -o main
$ ./main a.out
Start: 0x400000
Entry: 0x401106

And verify against objdump:

$ objdump -d a.out | grep -A10 '<main>'
0000000000401106 <main>:
  401106:       55                      push   %rbp'>'

And that's it for dealing with ELF. Now we can sketch out a virtual CPU and how we deal with interpreting instructions starting at this address.

The CPU

AMD64 counts on being able to store values in registers and memory, sometimes through direct addressing and sometimes indirectly using stack operations (push and pop). And userland processes count on being loaded into CPU memory so the CPU can jump to the process entrypoint and process.

type cpu struct {
    proc    *process
    mem     []byte
    regfile *registerFile
    tick    chan bool
}

func newCPU(memory uint64) cpu {
    return cpu{
        mem:     make([]byte, memory),
        regfile: &registerFile{},
        tick:    make(chan bool, 1),
    }
}

The tick channel is so that later on we can wrap the emulator in a terminal debugger. But by default we'll just set up a goroutine to tick forever.

func repl(c *cpu) {
  // TODO
}

func main() {
    if len(os.Args) < 2 {
        log.Fatal("Binary not provided")
    }

    proc, err := readELF(os.Args[1], "main")
    if err != nil {
        panic(err)
    }

    debug := false
    for _, arg := range os.Args[1:] {
        switch arg {
        case "--debug":
            fallthrough
        case "-d":
            debug = true
        }
    }

    // 10 MB
    cpu := newCPU(0x400000 * 10)

    go cpu.run(proc)
    if debug {
        repl(&cpu)
    } else {
        for {
            cpu.tick <- true
        }
    }
}

Registers

To emulate a simple program like our tests/simple.c, we'll only need to support a few common registers. The order is important so that we can use the Go identifiers when we want to refer to the encoded integer value of the register.

type register int

const (
    // These are in order of encoding value (i.e. rbp is 5)
    rax register = iota
    rcx
    rdx
    rbx
    rsp
    rbp
    rsi
    rdi
    r8
    r9
    r10
    r11
    r12
    r13
    r14
    r15
    rip
    rflags
)

var registerMap = map[register]string{
    rax:    "rax",
    rcx:    "rcx",
    rdx:    "rdx",
    rbx:    "rbx",
    rsp:    "rsp",
    rbp:    "rbp",
    rsi:    "rsi",
    rdi:    "rdi",
    r8:     "r8",
    r9:     "r9",
    r10:    "r10",
    r11:    "r11",
    r12:    "r12",
    r13:    "r13",
    r14:    "r14",
    r15:    "r15",
    rip:    "rip",
    rflags: "rflags",
}

type registerFile [18]uint64

func (regfile *registerFile) get(r register) uint64 {
    return regfile[r]
}

func (regfile *registerFile) set(r register, v uint64) {
    regfile[r] = v
}

Of immediate importance will be rip, rsp, and rax registers. rip is used to track the current instruction to process. It will generally be incremented except for when dealing with function calls and returns. rsp is used as a pointer to the top of a stack in memory. It is incremented and decremented as values are pushed and popped on this stack. Finally, rax is used to pass function return values.

Loading a program

Running a program is a matter of loading the program into memory, setting the stack pointer to the last address of memory (in x86 the stack grows down), pointing rip at the entrypoint, and looping until the entrypoint function returns.

func writeBytes(to []byte, start uint64, bytes int, val uint64) {
    for i := 0; i < bytes; i++ {
        to[start+uint64(i)] = byte(val >> (8 * i) & 0xFF)
    }
}

func (c *cpu) loop(entryReturnAddress uint64) {
    for {
        <-c.tick

        ip := c.regfile.get(rip)
        if ip == entryReturnAddress {
            break
        }

        inb1 := c.mem[ip]

        // TODO: deal with instructions

        // move to next instruction
        c.regfile.set(rip, ip+1)
    }
}

func (c *cpu) run(proc *process) {
    copy(c.mem[proc.startAddress:proc.startAddress+uint64(len(proc.bin))], proc.bin)
    c.regfile.set(rip, proc.entryPoint)
    initialStackPointer := uint64(len(c.mem)-8)
    writeBytes(c.mem, initialStackPointer, 8, initialStackPointer)
    c.regfile.set(rsp, initialStackPointer)
    c.loop(initialStackPointer)
    os.Exit(int(c.regfile.get(rax)))
}

We write the initial stack pointer address into the stack so that when the program final returns, it will return to this address at which pointer we can exit the program.

And now we're ready to start interpreting instructions.

Instruction decoding

Using objdump we get a sense for what the program decodes to.

$ objdump -d a.out | grep -A10 '<main>'
0000000000401106 <main>:
  401106:       55                      push   %rbp
  401107:       48 89 e5                mov    %rsp,%rbp
  40110a:       b8 fe 00 00 00          mov    $0xfe,%eax
  40110f:       5d                      pop    %rbp
  401110:       c3                      retq
  401111:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  401118:       00 00 00
  40111b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000401120 <__libc_csu_init>:

We see that 0x55 means push %rbp. And we also see that instructions aren't a fixed number of bytes. Some are one byte, some are seven. Some (not shown) are even longer.

Thankfully instructions follow some fairly simple patterns. There are a set of prefix instructions and a set of real instructions. So far we should be able to tell on the first byte whether the instruction is a prefix instruction and, if not, how many bytes the instruction will take up on the whole.

push

To support a new instruction, we'll look up 0x55 in an opcode table like this. Clicking on 55 in the opcode index we see that this is indeed a push instruction. 50+r means that we have to subtract 0x50 from the opcode to determine the register we should push.

The register will be 0x55 - 0x50 = 5 which if we look up in a register table is rbp. Since we set up our register enum in code in this order, we'll be able to just use the constant rbp in Go code.

Finally, since the next instruction numerically is 0x58 we know that this instruction is identified by being between 0x50 and 0x57 inclusive. This is all the info we need to handle this instruction.

// helper for dumping byte arrays as hex
func hbdebug(msg string, bs []byte) {
    str := "%s:"
    args := []interface{}{msg}
    for _, b := range bs {
        str = str + " %x"
        args = append(args, b)
    }
    fmt.Printf(str+"\n", args...)
}

func (c *cpu) loop(entryReturnAddress uint64) {
    for {
        <-c.tick

        ip := c.regfile.get(rip)
        if ip == entryReturnAddress {
            break
        }

        inb1 := c.mem[ip]

        if inb1 >= 0x50 && inb1 < 0x58 { // push
            regvalue := c.regfile.get(register(inb1 - 0x50))
            sp := c.regfile.get(rsp)
            writeBytes(c.mem, sp-8, 8, regvalue)
            c.regfile.set(rsp, uint64(sp-8))
        } else {
            hbdebug("prog", c.mem[ip:ip+10])
            panic("Unknown instruction")
        }

        c.regfile.set(rip, ip+1)
    }
}

If we try this out now we should expect it to panic on the second byte, 0x48.

$ go build -o main
$ ./main a.out
prog: 48 89 e5 b8 4 0 0 0 5d c3
panic: Unknown instruction

goroutine 19 [running]:
main.(*cpu).loop(0xc000086c30, 0x2800000)
        /home/phil/tmp/goamd/main.go:168 +0x16d
main.(*cpu).run(0xc000086c30, 0xc000086c00)
        /home/phil/tmp/goamd/main.go:180 +0xac
created by main.main
        /home/phil/tmp/goamd/main.go:211 +0x286

Looking good.

mov

Taking a look at the next two instructions with objdump we see mov encoded two different ways.

$ objdump -d a.out | grep -A4 '<main>'
0000000000401106 <main>:
  401106:       55                      push   %rbp
  401107:       48 89 e5                mov    %rsp,%rbp
  40110a:       b8 fe 00 00 00          mov    $0xfe,%eax

Looking up 0x48 we see that this is a prefix instruction that turns on 64-bit mode for the instruction. Some instructions like pop and push don't need this prefix to be in 64-bit mode. In any case, this just means we'll have to have a size flag that switches from 32-bit to 64-bit mode on seeing this instruction. This flag will be reset each time we start reading an instruction.

To deal with prefixes in general we'll loop through bytes when processing an instruction until we no longer see a prefix bytes. As we see prefix bytes we'll handle them accordingly.

var prefixBytes = []byte{0x48}

func (c *cpu) loop(entryReturnAddress uint64) {
    for {
        <-c.tick

        ip := c.regfile.get(rip)
        if ip == entryReturnAddress {
            break
        }

        inb1 := c.mem[ip]

        widthPrefix := 32
        for {
            isPrefixByte := false
            for _, prefixByte := range prefixBytes {
                if prefixByte == inb1 {
                    isPrefixByte = true
                    break
                }
            }

            if !isPrefixByte {
                break
            }

            // 64 bit prefix signifier
            if inb1 == 0x48 {
                widthPrefix = 64
            } else {
                hbdebug("prog", c.mem[ip:ip+10])
                panic("Unknown prefix instruction")
            }

            ip++
            inb1 = c.mem[ip]
        }

        if inb1 >= 0x50 && inb1 < 0x58 { // push

...

Moving past this prefix we get to 0x89. This instruction is for copying one register into another. The register operands are encoded in the second byte, 0xe5, called the ModR/M byte. Pulling out the two registers is just a matter of shifting and bitmasking the right 3 bits for each.

With this knowledge we can expand the instruction handling code.

        if inb1 >= 0x50 && inb1 < 0x58 { // push
            regvalue := c.regfile.get(register(inb1 - 0x50))
            sp := c.regfile.get(rsp)
            writeBytes(c.mem, sp-8, 8, regvalue)
            c.regfile.set(rsp, uint64(sp-8))
        }  else if inb1 == 0x89 { // mov r/m16/32/64, r/m16/32/64
            ip++
            inb2 := c.mem[ip]
            rhs := register((inb2 & 0b00111000) >> 3)
            lhs := register(inb2 & 0b111)
            c.regfile.set(lhs, c.regfile.get(rhs))
        } else {
            hbdebug("prog", c.mem[ip:ip+10])
            panic("Unknown instruction")
        }

Try emulating a.out again now. It will panic on the next unknown instruction, 0xb8. From objdump disassembly we see this is another mov instruction.

Hurray! There are apparently multiple ways the same instruction can be encoded. Looking it up in the opcode table, we see 0xB8 is for when the value to be copied is a literal number. The operand will be 32-bits, or four bytes, presumably because it doesn't have the 0x48 prefix.

// helper for converting up to 8 bytes into a single integer
func readBytes(from []byte, start uint64, bytes int) uint64 {
    val := uint64(0)
    for i := 0; i < bytes; i++ {
        val |= uint64(from[start+uint64(i)]) << (8 * i)
    }

    return val
}

func (c *cpu) loop(entryReturnAddress uint64) {

    ...

    for {

        ...

        } else if inb1 >= 0xB8 && inb1 < 0xC0 { // mov r16/32/64, imm16/32/64
            lreg := register(inb1 - 0xB8)
            val := readBytes(c.mem, ip+uint64(1), widthPrefix/8)
            ip += uint64(widthPrefix / 8)
            c.regfile.set(lreg, val)
        }

        ...

Two more instructions to go: pop and ret.

A terminal debugger

Taking a break for a moment, our system is already too complex to understand. It would be helpful to have a REPL so we can step through instructions and print register and memory values.

func (c *cpu) resolveDebuggerValue(dval string) (uint64, error) {
    for reg, val := range registerMap {
        if val == dval {
            return c.regfile.get(reg), nil
        }
    }

    if len(dval) > 2 && (dval[:2] == "0x" || dval[:2] == "0X") {
        return strconv.ParseUint(dval[2:], 16, 64)
    }

    return strconv.ParseUint(dval, 10, 64)
}

func repl(c *cpu) {
    fmt.Println("go-amd64-emulator REPL")
    help := `commands:
    s/step:             continue to next instruction
    r/registers [$reg]:     print all register values or just $reg
    d/decimal:          toggle hex/decimal printing
    m/memory $from $count:      print memory values starting at $from until $from+$count
    h/help:             print this`
    fmt.Println(help)
    scanner := bufio.NewScanner(os.Stdin)

    intFormat := "%d"
    for {
        fmt.Printf("> ")
        if !scanner.Scan() {
            break
        }
        input := scanner.Text()
        parts := strings.Split(input, " ")

        switch parts[0] {
        case "h":
            fallthrough
        case "help":
            fmt.Println(help)

        case "m":
            fallthrough
        case "memory":
            msg := "Invalid arguments: m/memory $from $to; use hex (0x10), decimal (10), or register name (rsp)"
            if len(parts) != 3 {
                fmt.Println(msg)
                continue
            }

            from, err := c.resolveDebuggerValue(parts[1])
            if err != nil {
                fmt.Println(msg)
                continue
            }

            to, err := c.resolveDebuggerValue(parts[2])
            if err != nil {
                fmt.Println(msg)
                continue
            }

            hbdebug(fmt.Sprintf("memory["+intFormat+":"+intFormat+"]", from, from+to), c.mem[from:from+to])

        case "d":
            fallthrough
        case "decimal":
            if intFormat == "%d" {
                intFormat = "0x%x"
                fmt.Println("Numbers displayed as hex")
            } else {
                intFormat = "%d"
                fmt.Println("Numbers displayed as decimal")
            }

        case "r":
            fallthrough
        case "registers":
            filter := ""
            if len(parts) > 1 {
                filter = parts[1]
            }

            for i := 0; i < len(registerMap); i++ {
                reg := register(i)
                name := registerMap[reg]
                if filter != "" && filter != name {
                    continue
                }

                fmt.Printf("%s:\t"+intFormat+"\n", name, c.regfile.get(reg))
            }

        case "s":
            fallthrough
        case "step":
            c.tick <- true
        }
    }
}

Let's try it out:

$ go build -o main
$ ./main a.out --debug
go-amd64-emulator REPL
commands:
        s/step:                         continue to next instruction
        r/registers [$reg]:             print all register values or just $reg
        d/decimal:                      toggle hex/decimal printing
        m/memory $from $count:          print memory values starting at $from until $from+$count
        h/help:                         print this
> r
rax:    0
rcx:    0
rdx:    0
rbx:    0
rsp:    41943040
rbp:    0
rsi:    0
rdi:    0
r8:     0
r9:     0
r10:    0
r11:    0
r12:    0
r13:    0
r14:    0
r15:    0
rip:    4198662
rflags: 0
> m rip 10
memory[4198662:4198672]: 55 48 89 e5 b8 4 0 0 0 5d
> s
> m rip 10
memory[4198663:4198673]: 48 89 e5 b8 4 0 0 0 5d c3
> r
rax:    0
rcx:    0
rdx:    0
rbx:    0
rsp:    41943032
rbp:    0
rsi:    0
rdi:    0
r8:     0
r9:     0
r10:    0
r11:    0
r12:    0
r13:    0
r14:    0
r15:    0
rip:    4198663
rflags: 0
> ^D

Now we can inspect the system interactively.

pop

Reemersing in the state of things, we now panic on 0x5D.

./main a.out
prog: 5d c3 66 2e f 1f 84 0 0 0
panic: Unknown instruction

goroutine 5 [running]:
main.(*cpu).loop(0xc000098ae0, 0x2800000)
        /home/phil/tmp/goamd/main.go:219 +0x2c5
main.(*cpu).run(0xc000098ae0, 0xc000098ab0)
        /home/phil/tmp/goamd/main.go:231 +0xac
created by main.main
        /home/phil/tmp/goamd/main.go:358 +0x286

Looking this up we see this is part of 58+r, pop. Similar to push we subtract 0x58 from the byte to get the register to pop onto. The stack operation is the reverse of push.

func (c *cpu) loop(entryReturnAddress uint64) {

    ...

    for {

        ...

        } else if inb1 >= 0x58 && inb1 < 0x60 { // pop
            lhs := register(inb1 - 0x58)
            sp := c.regfile.get(rsp)
            c.regfile.set(lhs, readBytes(c.mem, sp, 8))
            c.regfile.set(rsp, uint64(sp+8))
        }
        ...

Build and run for the final panic:

$ go build -o main
$ ./main a.out
prog: c3 66 2e f 1f 84 0 0 0 0
panic: Unknown instruction

goroutine 8 [running]:
main.(*cpu).loop(0xc000060c30, 0x2800000)
        /home/phil/tmp/goamd/main.go:224 +0x345
main.(*cpu).run(0xc000060c30, 0xc000060c00)
        /home/phil/tmp/goamd/main.go:236 +0xac
created by main.main
        /home/phil/tmp/goamd/main.go:363 +0x286

ret

Looking up 0xC3 we see that it is indeed ret. This function's responsibilty is to pop the stack onto rip, jumping back to caller.

        } else if inb1 == 0xC3 { // ret
            sp := c.regfile.get(rsp)
            retAddress := readBytes(c.mem, sp, 8)
            c.regfile.set(rsp, uint64(sp+8))
            c.regfile.set(rip, retAddress)
            continue
        }

Build and run:

$ go build -o main
$ ./main a.out
$ echo $?
4

If we modify tests/simple.c?

$ cat tests/simple.c
int main() {
  return 254;
}
$ gcc tests/simple.c
$ ./main a.out && echo $?
254

Not bad!

Process and next steps

Getting this far took a lot of trial and error, much of it hidden in this post. Setting up the REPL was critical to debugging mistakes. But aggressively unit testing would probably have been similarly fruitful. In the end, the most bug-prone aspects are basic arithmetic (off by one errors and converting bytes to/from integers). The part that's not terribly hard is actually interpreting instructions! But it's made easier by greatly simplifying the problem and ignoring legion cases.

Along the way it would have been helpful to also disassemble so that instead of just dumping memory at the instruction pointer we print the instructions we thought we were going to process. That may be a next goal.

Otherwise the typical goals are around getting syscall support, function call support, and porting these simple examples to Windows and macOS for the experience.