April 14, 2019

Interpreting TypeScript

In addition to providing a static type system and compiler for a superset of JavaScript, TypeScript makes much of its functionality available programmatically. In this post we'll use the TypeScript compiler API to build an interpreter. We'll build off of a TypeScript wiki article and cover a few areas that were confusing to me as I built out a separate project.

The end result we're building will look like this:

$ cat test.ts # A program we can interpret
print(1 + 5);
$ tsc interpreter.ts # Build the source code for the interpreter
$ node interpreter.js test.ts # Run the interpreter against test program
6

All code is available on Github.

Setup

To begin with, we need Node.js and some dependencies:

$ yarn add typescript @types/node

Then we can begin the first stage of an interpreter: parsing the code.

Parsing

Parsing a fixed set of files is simple enough. We pass a list of files to createProgram along with compiler options. But, as a user, we don't want to keep track of all files used by a program (i.e. everything we import). The most ideal situation is to pass a single-file entrypoint (something like a main.js) and have our interpreter figure out all the imports and handle them recursively. More on this later, for now we'll just parse the single-file entrypoint.

import * as ts from 'typescript';

const TS_COMPILER_OPTIONS = {
  allowNonTsExtensions: true,
};

function parse(fileName: string): ts.Program {
  return ts.createProgram([fileName], TS_COMPILER_OPTIONS);
}

function interpret(program: ts.Program) { // TODO }

function main(entrypoint: string) {
  const program = parse(entrypoint);
  interpret(program);
}

main(process.argv[2]);

interpret and ts.Program

A program contains all source files as well as any implicitly needed TypeScript definition files (for us it will just be the TypeScript definitions for the Node.js standard library).

The program also gives us access to a type checker that we can use to query the type of any node in the program tree. We'll get into this in another post.

Our interpret program will iterate over all the source files, ignoring the TypeScript definition files, and call interpretNode on all the elements of the source file.

function interpretNode(node: ts.Node) { // TODO }

function interpret(program: ts.Program) {
  return program.getSourceFiles().map((source) => {
    const { fileName } = source;
    if (fileName.endsWith('.d.ts')) {
      return;
    }

    const results = [];
    ts.forEachChild(source, (node) => {
      results.push(interpretNode(node));
    });
    return results;
  });
}

interpretNode and ts.Node

A Node is a wrapper for most elements of what we consider a program to be, such as a binary expression (2 + 3), a literal expression (2), a function call expression (a(c)), and so forth. When exploring a parser, it takes time to become familiar with the particular way that a parser breaks out a program into a tree of nodes.

As a concrete example, the following program:

print(a);

Will be built into ts.Node tree along these lines:

Node: ExpressionStatement: print(a);
  Node: CallExpression: print, a
    Node: Identifier: print
    Node: Identifier: a
Node: EndOfFileToken

And another example:

1 + 3;

Will be built into a ts.Node tree along these lines:

Node: Expression: 1 + 3
  Node: BinaryExpression: 1, 3, +
    Node: NumericLiteral: 1
    Node: NumericLiteral: 3
    Node: PlusToken
Node: EndOfFileToken

But how would one come to know this?

Exploring the ts.Node tree

The easiest thing to do is throw an error on every Node type we don't yet know about and fill in support for each program we throw at the interpreter.

For example:

function interpretNode(node: ts.Node) {
  switch (node.kind) {
    default:
      throw new Error('Unsupported node type: ' + ts.SyntaxKind[node.kind]);
  }
}

Now let's run our interpreter against an input file, test.ts, that combines these two to make a semi-interesting program:

$ cat test.ts
print(1 + 2);
$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: ExpressionStatement
...

And we see an outer wrapper, an ExpressionStatement. To proceed we look up the definition of an ExpressionStatement in TypeScript source code, src/compiler/types.ts to be specific. This file will become our best friend. Hit ctrl-f and look for "interface ExpressionStatement ". We see that it has only one child, expression, so we call interpretNode on this recursively:

function interpretNode(node: ts.Node) {
  switch (node.kind) {
    case ts.SyntaxKind.ExpressionStatement: {
      const es = node as ts.ExpressionStatement;
      return interpretNode(es.expression);
    }
    default:
      throw new Error('Unsupported node type: ' + ts.SyntaxKind[node.kind]);
  }
}

Thankfully TypeScript will be very quick to call us out if we misunderstand this structure.

It's pretty weird to me that the ts.Node tree is organized such that I must cast at each ts.Node but that's what they do even in the TypeScript source so I don't think I'm misunderstanding.

Now we recompile and run the interpreter against the program to discover the next ts.Node type.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: CallExpression
...

Cool! Back to src/compiler/types.ts. Call expressions are complex enough that we'll break out handling them into a separate function.

interpretCall and ts.CallExpression

From our reading of types.ts we need to handle the expression that evaluates to a function, and we need to handle its parameters. We'll just call interpretNode on each of these to get their real value. And finally we'll call the function with the arguments.

function interpretCall(ce: ts.CallExpression) {
  const fn = interpretNode(ce.expression);
  const args = ce.arguments.map(interpretNode);
  return fn(...args);
}

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.CallExpression: {
      const ce = node as ts.CallExpression;
      return interpretCall(ce);
    }
    ...
  }
}

Please ignore the fact that we are not correctly setting this here.

Recompile and let's see what's next!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: Identifier
...

And back to types.ts.

ts.Identifier

In order to support identifiers in general we'd need to have a context we could use to look up the value of an identifier. But we don't have a context like this right now so we'll add builtin support for a print function so we can get some output!

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.Identifier: {
      const id = (node as ts.Identifier).escapedText as string;
      if (id === 'print') {
        return function (...args) { console.log(...args); };
      }

      throw new Error('Unsupported identifier: ' + id);
    }
    ...
  }
}

Recompile and let's see what's next!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: BinaryExpression
...

And we're finally into the parameters.

interpretBinaryExpression and ts.BinaryExpression

Looking into types.ts for this Node type suggests we may want to break this out into its own function as well; there are a ton of operator types. Within the interpretBinaryExpression helper we'll interpret each operand and then switch on the operator type. We'll throw an error on operators we don't know about -- all of them at first:

function interpretBinaryExpression(be: ts.BinaryExpression) {
  const left = interpretNode(be.left);
  const right = interpretNode(be.right);
  switch (be.operatorToken.kind) {
    default:
      throw new Error('Unsupported operator: ' + ts.SyntaxKind[be.operatorToken.kind]);
  }
}

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.BinaryExpression: {
      const be = node as ts.BinaryExpression;
      return interpretBinaryExpression(be);
    }
    ...
  }
}

We know the drill.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: FirstLiteralToken
...

At this point we're actually failing first on an unknown node type rather than an operator. This is because we interpret the operands (which are numeric literals) before we look up the operator. Time to revisit types.ts!

ts.FirstLiteralToken, ts.NumericLiteral

Looking at types.ts shows us that FirstLiteralToken is a synonym for NumericLiteral. The latter name is more obvious, so let's add that to our supported Node list:

function interpretNode() {
  switch (node.kind) {
    ...
      case ts.SyntaxKind.NumericLiteral: {
      const nl = node as ts.NumericLiteral;
      return Number(nl.text);
    }
    ...
  }
}

And we keep going!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported operator: PlusToken
...

And we're into unknown operator territory!

interpretBinaryExpression and ts.PlusToken

A simple extension to our existing interpretBinaryExpression, we return the sum of the left and right values:

function interpretBinaryExpression(be: ts.BinaryExpression) {
  const left = interpretNode(be.left);
  const right = interpretNode(be.right);
  switch (be.operatorToken.kind) {
    case ts.SyntaxKind.PlusToken:
      return left + right;
    default:
      throw new Error('Unsupported operator: ' + ts.SyntaxKind[be.operatorToken.kind]);
  }
}

And we give it another shot.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: EndOfFileToken
...

ts.SyntaxKind.EndOfFileToken

Our final Node type before a working program, we simply do nothing:

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.EndOfFileToken:
      break;
    ...
  }
}

One more time:

$ tsc interpreter.ts
$ node interpreter.js test.ts
3

A working program! And if we jiggle the test?

$ cat test.ts
print(1 + 5);
$ node interpreter.js test.ts
6

We're well on our way to interpreting TypeScript, and have gained some familiarity with the TypeScript Compiler API.

All code is available on Github.