Notes

Emulator basics (2): system calls

July 20, 2019

emulatoramd64javascriptx86syscalls

Previously in emulator basics:
Emulator basics (1): a stack and register machine

In this post we'll extend x86e to support the exit and write Linux system calls, or syscalls. A syscall is a function handled by the kernel that allows the process to interact with data outside of its memory. The SYSCALL instruction takes arguments in the same order that the regular CALL instruction does. But SYSCALL additionally requires the RAX register to contain the integer number of the syscall.

Historically, there have been a number of different ways to make syscalls. All methods perform variations on a software interrupt. Before AMD64, on x86 processors, there was the SYSENTER instruction. And before that there there was only INT 80h to trigger the interrupt with the syscall handler (since interrupts can be used for more than just syscalls). The various instructions around interrupts have been added for efficiency as the processors and use by operating systems evolved.

Since this is a general need and AMD64 processors are among the most common today, you'll see similar code in every modern operating system such as FreeBSD, OpenBSD, NetBSD, macOS, and Linux. (I have no background in Windows.) The calling convention may differ (e.g. which arguments are in which registers) and the syscall numbers differ. Even within Linux both the calling convention and the syscall numbers differ between x86 (32-bit) and AMD64/x86_64 (64-bit) versions.

See this StackOverflow post for some more detail.

Code for this post in full is available as a Gist.

Exit

The exit syscall is how a child process communicates with the process that spawned it (its parent) when the child is finished running. Exit takes one argument, called the exit code or status code. It is an arbitrary signed 8-bit integer. If the high bit is set (i.e. the number is negative), this is interpreted to mean the process exited abnormally such as due to a segfault. Shells additionally interpret any non-zero exit code as a "failure". Otherwise, and ignoring these two common conventions, it can be used to mean anything the programmer wants.

The wait syscall is how the parent process can block until exit is called by the child and receive its exit code.

On AMD64 Linux the syscall number is 60. For example:

  MOV RDI, 0
  MOV RAX, 60
  SYSCALL

This calls exit with a status code of 0.

Write

The write syscall is how a process can send data to file descriptors, which are integers representing some file-like object. By default, a Linux process is given access to three file descriptors with consistent integer values: stdin is 0, stdout is 1, and stderr is 2. Write takes three arguments: the file descriptor integer to write to, a starting address to memory that is interpreted as a byte array, and the number of bytes to write to the file descriptor beginning at the start address.

On AMD64 Linux the syscall number is 1. For example:

  MOV RDI, 1   ; stdout
  MOV RSI, R12 ; address of string
  MOV RDX, 8   ; 8 bytes to write
  MOV RAX, 1   ; write
  SYSCALL

This writes 8 bytes to stdout starting from the string whose address is in R12.

Implementing syscalls

Our emulator is simplistic and is currently only implementing process emulation, not full CPU emulation. So the syscalls themselves will be handled in JavaScript. First we'll write out stubs for the two syscalls we are adding. And we'll provide a map from syscall id to the syscall.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {},
  60: function sys_exit(process) {},
};

We need to add an instruction handler to our instruction switch. In doing so we must convert the value in RAX from a BigInt to a regular Number so we can look it up in the syscall map.

      case 'syscall': {
        const idNumber = Number(process.registers.RAX);
        SYSCALLS_BY_ID[idNumber](process);
        process.registers.RIP++;
        break;
      }

Exit

Exit is really simple. It will be implemented by calling Node's global.process.exit(). Again we'll need to convert the register's BigInt value to a Number.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {},
  60: function sys_exit(process) {
    global.process.exit(Number(process.registers.RDI));
  },
};

Write

Write will be implemented by iterating over the process memory as bytes and by calling write() on the relevant file descriptor. We'll store a map of these on the process object and supply stdout, stderr, and stdin proxies on startup.

function main(file) {
  ...

  const process = {
    registers,
    memory,
    instructions,
    labels,
    fd: {
      // stdout
      1: global.process.stdout,
    }
  };

  ...
}

The base address is stored in RSI, the number of bytes to write are stored in RDX. And the file descriptor to write to is stored in RDI.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {
    const msg = BigInt(process.registers.RSI);
    const bytes = Number(process.registers.RDX);
    for (let i = 0; i < bytes; i++) {
      const byte = readMemoryBytes(process, msg + BigInt(i), 1);
      const char = String.fromCharCode(Number(byte));
      process.fd[Number(process.registers.RDI)].write(char);
    }
  },
  ...

All together

$ cat exit3.asm
main:
  MOV RDI, 1
  MOV RSI, 2
  ADD RDI, RSI

  MOV RAX, 60 ; exit
  SYSCALL
$ node emulator.js exit3.asm
$ echo $?
3
$ cat hello.asm
main:
  PUSH 10  ; \n
  PUSH 33  ; !
  PUSH 111 ; o
  PUSH 108 ; l
  PUSH 108 ; l
  PUSH 101 ; e
  PUSH 72  ; H

  MOV RDI, 1   ; stdout
  MOV RSI, RSP ; address of string
  MOV RDX, 56  ; 7 8-bit characters in the string but PUSH acts on 64-bit integers
  MOV RAX, 1   ; write
  SYSCALL

  MOV RDI, 0
  MOV RAX, 60
  SYSCALL
$ node emulator.js hello.asm
Hello!
$

Next steps

We still aren't setting flags appropriately to support conditionals, so that's low-hanging fruit. There are some other fun syscalls to implement that would also give us access to an emulated VGA card so we could render graphics. Syntactic support for string constants would also be convenient and more efficient.

Share