October 30, 2022

A minimal RocksDB example with Zig

I mostly programmed in Go the last few years. So every time I wanted an embedded key-value database, I reached for Cockroach's Pebble.

Pebble is great for Go programming but Go does not embed well into other languages. Pebble was inspired by RocksDB (and its predecessor, LevelDB). Both were written in C++ which can more easily be embedded into any language with a C foreign function interface. Pebble also has some interesting limitations that RocksDB does not, transactions for example.

So I've been wanting to get familiar with RocksDB. And I've been learning Zig, so I set out to write a simple Zig program that embeds RocksDB. (If you see weird things in my Zig code and have suggestions, send me a note!)

This post is going to be a mix of RocksDB explanations and Zig explanations. By the end we'll have a simple CLI over a durable store that is able to set keys, get keys, and list all key-value pairs (optionally filtered on a key prefix).

$ ./kv set x 1
$ ./kv get x
1
$ ./kv set y 22
$ ./kv list x
x = 1
$ ./kv list y
y = 22
$ ./kv list
x = 1
y = 22

Basic stuff!

You can find the code for this post in the rocksdb.zig file on Github. To simplify things, this code is only going to work on Linux. And it will require Zig 0.10.x.

RocksDB

RocksDB is written in C++. But most languages cannot interface with C++. (Zig cannot either, as far as I understand). So most C++ libraries expose a C API that is easier for other programming languages to interact with. RocksDB does this. Great!

Now RocksDB's C++ documentation is phenomenal, especially among C++ libraries. But if there is documentation for the C API, I couldn't find it. Instead you must trawl through the C header file, the C wrapper implementation, and the C tests.

There was also a great gist showing a minimal RocksDB C example. But it didn't cover the iterator API for fetching a range of keys with a prefix. But with the C tests file I was able to figure it out, I think.

Let's dig in!

Creating, opening and closing a RocksDB database

First we need to import the C header so that Zig can compile-time verify the foreign functions we call. We'll also import the standard library that we'll use later.

Aside from build.zig below, all code should be in main.zig.

const std = @import("std");

const rdb = @cImport(@cInclude("rocksdb/c.h"));

Don't read anything into the `@` other than that this is a compiler builtin. It's used for imports, casting, and other metaprogramming.

Now we can build our wrapper. It will be a Zig struct that contains a pointer to the RocksDB instance.

const RocksDB = struct {
    db: *rdb.rocksdb_t,

To open a database we'll call rocksdb_open() with a directory name for RocksDB to store data. And we'll tell RocksDB to create the database if it doesn't already exist.

    fn open(dir: []const u8) struct { val: ?RocksDB, err: ?[]u8 } {
        var options: ?*rdb.rocksdb_options_t = rdb.rocksdb_options_create();
        rdb.rocksdb_options_set_create_if_missing(options, 1);
        var err: ?[*:0]u8 = null;
        var db: ?*rdb.rocksdb_t = rdb.rocksdb_open(options, dir.ptr, &err);
        if (err != null) {
            return .{ .val = null, .err = std.mem.span(err) };
        }
        return .{ .val = RocksDB{ .db = db.? }, .err = null };
    }

Finally, we close with rocksdb_close():

    fn close(self: RocksDB) void {
        rdb.rocksdb_close(self.db);
    }

The RocksDB aspect of this is easy. But there's a bunch of Zig-specific details I should (try to) explain.

Return types

Zig has a cool error type. try/catch in Zig work only with this error type and subsets of it you can create. error is an enum. But Zig errors are not ML-style tagged unions (yet?). That is, you cannot both return an error and some dynamic information about the error. So the usefulness of error is limited. It mostly only works if the errors are a finite set without dynamic aspects.

Zig also doesn't have multiple return values. But it does have optional types (denoted with ?) and it has anonymous structs.

So we can do a slightly less safe, but more informational, error type by returning a struct with an optional success value and an optional error.

That's how we get the return type struct { val: ?RocksDB, err: ?[]u8 }.

This is not very different from Go, certainly no less safe, and I'm probably biased to use this as a Go programmer.

Felix Queißner points out to me that there are tagged unions in Zig that would be more safe here. Instead of struct { val: ?RocksDB, err: ?[]u8 } I could do union(enum) { val: RocksDB, err: []u8 }. When I get a chance to play with that syntax I'll modify this post.

Optional pointers

The next thing you may notice is ?*rdb.rocksdb_options_t and ?*rdb.rocksdb_t. This is to work with Zig's type system. Zig expects that pointers are not null. By adding ? we are telling Zig that this value can be null. That way the Zig type system will force us to handle the null condition if we try to access fields on the value.

In the options case, it doesn't really matter if the result is null or not. In the database case, we handle null-ness it by checking the error value if (err) |errStr|. If this condition is not met, we know the database is not null. So we use db.? to assert and return a value that, in the type system, is not null.

Zig strings, C strings

Another thing you may notice is var err: ?[*:0]u8 = null;. Zig strings are expressed as byte arrays or byte slices. []u8 and []const u8 are slices that keep track of the number of items. [*:0]u8 is not a byte slice. It has no length and is only null-delimited. To go from the null-delimited array that the C API returns to the []u8 (slice that contains length) in our function's return signature we use std.mem.span.

This StackOverflow post was useful for understanding this.

Structs

Anonymous structs in Zig are prefixed with a .. And all struct fields, anonymous or not, are prefixed with ..

So .{.x = 1} instantiates an anonymous struct that has one field x.

Struct fields in Zig cannot not be instantiated, even if they are nullable. And when you initialize a nullable value you don't need to wrap it in a Some() like you might do in an ML.

One thing I found surprising about Zig anonymous structs is that instances of the anonymous type are created per function and two anonymous structs that are structurally identical but referenced in different functions are not actually type-equal.

So this doesn't compile:

$ cat test.zig
fn doA() struct { y: u8 } {
  return .{.y = 1};
}

fn doB() struct { y: u8 } {
  return doA();
}

pub fn main() !void {
  _ = doB();
}
$ zig build-exe test.zig
test.zig:5:15: error: expected type 'test.doB__struct_2890', found 'test.doA__struct_3878'
    return doA();
           ~~~^~
test.zig:1:10: note: struct declared here
fn doA() struct { y: u8 } {
         ^~~~~~~~~~~~~~~~
test.zig:4:10: note: struct declared here
fn doB() struct { y: u8 } {
         ^~~~~~~~~~~~~~~~
test.zig:4:10: note: function return type declared here
fn doB() struct { y: u8 } {
         ^~~~~~~~~~~~~~~~
referenced by:
    main: test.zig:8:9
    callMain: /whatever/lib/std/start.zig:606:32
    remaining reference traces hidden; use '-freference-trace' to see all reference traces

You would need to instantiate a new anonymous struct in the second function.

$ cat test.zig
fn doA() struct { y: u8 } {
  return .{.y = 1};
}

fn doB() struct { y: u8 } {
  return .{ .y = doA().y };
}

pub fn main() !void {
  _ = doB();
}

Uniform function call syntax

Zig seems to support something like uniform function call syntax where you can either call a function with arguments or you can omit the first argument by prefixing the function call with firstargument.. I.e. x.add(y) and add(x, y).

In the case of this code it would be RocksDB.close(db) vs db.close() assuming db is an instance of the RocksDB struct.

Like Python, the use of self as the name of this first parameter of a struct's methods is purely convention. You can call it whatever.

The point is that we always expect the user to var db = RocksDB.open() for open() and allow the user to do db.close() for close().

Let's move on!

Setting a key-value pair

We set a pair by calling rocksdb_put with the database instance, some options (we'll leave to defaults), and the key and value strings as C strings.

    fn set(self: RocksDB, key: [:0]const u8, value: [:0]const u8) ?[]u8 {
        var writeOptions = rdb.rocksdb_writeoptions_create();
        var err: ?[*:0]u8 = null;
        rdb.rocksdb_put(
            self.db,
            writeOptions,
            key.ptr,
            key.len,
            value.ptr,
            value.len,
            &err,
        );
        if (err) |errStr| {
            return std.mem.span(errStr);
        }

        return null;
    }

The only special Zig thing is there is key.ptr to satisfy the Zig / C type system. The type signature key: [:0]const u8 and value: [:0]const u8 makes sure that the user passes in a null-delimited byte slice, which is what the RocksDB API expects.

Getting a value from a key

We set a pair by calling rocksdb_get with the database instance, some options (we'll again leave to defaults), and the key as a C string.

    fn get(self: RocksDB, key: [:0]const u8) struct { val: ?[]u8, err: ?[]u8 } {
        var readOptions = rdb.rocksdb_readoptions_create();
        var valueLength: usize = 0;
        var err: ?[*:0]u8 = null;
        var v = rdb.rocksdb_get(
            self.db,
            readOptions,
            key.ptr,
            key.len,
            &valueLength,
            &err,
        );
        if (err) |errStr| {
            return .{ .val = null, .err = std.mem.span(errStr) };
        }
        if (v == 0) {
            return .{ .val = null, .err = null };
        }

        return .{ .val = v[0..valueLength], .err = null };
    }

One thing in there to call out is that we can go from a null-delimited value v to a standard Zig slice []u8 by slicing from 0 to the length of the value returned by the C API.

Also, rocksdb_get is only used for getting a single key-value pair. We'll handle key-value pair iteration next.

Iterating over key-value pairs

The basic structure of RocksDB's iterator API is that you first create an iterator instance with rocksdb_create_iterator(). Then you either rocksdb_iter_seek_to_first() or rocksdb_iter_seek() (with a prefix) to get the iterator ready. Then you get the current iterator entry's key with rocksdb_iter_key() and value with rocksdb_iter_value(). You move on to the next entry in the iterator with rocksdb_iter_next() and check that the current iterator value is valid with rocksdb_iter_valid(). When the iterator is no longer valid, or if you want to stop iterating early, you call rocksdb_iter_destroy().

But we'd like to present a Zig-only interface to users of the RocksDB Zig struct. So we'll create a RocksDB.iter() function that returns a RocksDB.Iter with an RocksDB.Iter.next() function that will return an optional RocksDB.IterEntry.

We'll start backwards with that RocksDB.Iter struct.

RocksDB.Iter

Each iterator instance will store a pointer to a RocksDB iterator instance. It will store the prefix requested (which is allowed to be an empty string). If the prefix is set though, we'll only iterate while the iterator key has the requested prefix.

    const IterEntry = struct {
        key: []const u8,
        value: []const u8,
    };

    const Iter = struct {
        iter: *rdb.rocksdb_iterator_t,
        first: bool,
        prefix: []const u8,

        fn next(self: *Iter) ?IterEntry {
            if (!self.first) {
                rdb.rocksdb_iter_next(self.iter);
            }

            self.first = false;
            if (rdb.rocksdb_iter_valid(self.iter) != 1) {
                return null;
            }

            var keySize: usize = 0;
            var key = rdb.rocksdb_iter_key(self.iter, &keySize);

            // Make sure key is still within the prefix
            if (self.prefix.len > 0) {
                if (self.prefix.len > keySize or
                    !std.mem.eql(u8, key[0..self.prefix.len], self.prefix))
                {
                    return null;
                }
            }

            var valueSize: usize = 0;
            var value = rdb.rocksdb_iter_value(self.iter, &valueSize);

            return IterEntry{
                .key = key[0..keySize],
                .value = value[0..valueSize],
            };
        }

Finally we'll wrap the rocksdb_iter_destroy() method:

        fn close(self: Iter) void {
            rdb.rocksdb_iter_destroy(self.iter);
        }
    };

RocksDB.iter()

Now we can write the function that creates the RocksDB.Iter. As previously mentioned we must first instantiate the RocksDB iterator and then seek to either the first entry if the user doesn't request a prefix. Or if the user requests a prefix, we seek until that prefix.

fn iter(self: RocksDB, prefix: [:0]const u8) struct { val: ?Iter, err: ?[]const u8 } {
        var readOptions = rdb.rocksdb_readoptions_create();
        var it = Iter{
            .iter = undefined,
            .first = true,
            .prefix = prefix,
        };
        if (rdb.rocksdb_create_iterator(self.db, readOptions)) |i| {
            it.iter = i;
        } else {
            return .{ .val = null, .err = "Could not create iterator" };
        }

        if (prefix.len > 0) {
            rdb.rocksdb_iter_seek(
                it.iter,
                prefix.ptr,
                prefix.len,
            );
        } else {
            rdb.rocksdb_iter_seek_to_first(it.iter);
        }
        return .{ .val = it, .err = null };
    }
};

And now we're done a basic Zig wrapper for the RocksDB API!

main

Next we write a simple command-line entrypoint that uses the RocksDB wrapper we built. This is not the prettiest code but it gets the job done.

pub fn main() !void {
    var openRes = RocksDB.open("/tmp/db");
    if (openRes.err) |err| {
        std.debug.print("Failed to open: {s}.\n", .{err});
    }
    var db = openRes.val.?;
    defer db.close();

    var args = std.process.args();
    _ = args.next();
    var key: [:0]const u8 = "";
    var value: [:0]const u8 = "";
    var command = "get";
    while (args.next()) |arg| {
        if (std.mem.eql(u8, arg, "set")) {
            command = "set";
            key = args.next().?;
            value = args.next().?;
        } else if (std.mem.eql(u8, arg, "get")) {
            command = "get";
            key = args.next().?;
        } else if (std.mem.eql(u8, arg, "list")) {
            command = "lst";
            if (args.next()) |argNext| {
                key = argNext;
            }
        } else {
            std.debug.print("Must specify command (get, set, or list). Got: '{s}'.\n", .{arg});
            return;
        }
    }

    if (std.mem.eql(u8, command, "set")) {
        var setErr = db.set(key, value);
        if (setErr) |err| {
            std.debug.print("Error setting key: {s}.\n", .{err});
            return;
        }
    } else if (std.mem.eql(u8, command, "get")) {
        var getRes = db.get(key);
        if (getRes.err) |err| {
            std.debug.print("Error getting key: {s}.\n", .{err});
            return;
        }

        if (getRes.val) |v| {
            std.debug.print("{s}\n", .{v});
        } else {
            std.debug.print("Key not found.\n", .{});
        }
    } else {
        var prefix = key;
        var iterRes = db.iter(prefix);
        if (iterRes.err) |err| {
            std.debug.print("Error getting iterator: {s}.\n", .{err});
        }
        var iter = iterRes.val.?;
        defer iter.close();
        while (iter.next()) |entry| {
            std.debug.print("{s} = {s}\n", .{ entry.key, entry.value });
        }
    }
}

Notably, the main function must be marked pub. The struct and struct methods we wrote would need to be marked pub if we wanted them accessible from other files. But since this is a single file, pub doesn't matter. Except for main.

Now we can get into building.

Building

First we need to compile the RocksDB library. To do this we simply git clone RocksDB and run make shared_libs.

Compiling RocksDB

$ git clone https://github.com/facebook/rocksdb
$ ( cd rocksdb && make shared_lib -j8 )

This may take a while, sorry.

build.zig

Next we need to write a build.zig script that tells Zig about this external library. This was one of the harder parts of the process, but building and linking against foreign libraries is almost always hard.

$ cat build.zig
const version = @import("builtin").zig_version;
const std = @import("std");

pub fn build(b: *std.build.Builder) void {
    const exe = b.addExecutable("main", "main.zig");
    exe.linkLibC();
    exe.linkSystemLibraryName("rocksdb");

    exe.addLibraryPath("./rocksdb");
    exe.addIncludePath("./rocksdb/include");

    exe.setOutputDir(".");
    exe.install();
}

Felix Queißner's zig build explained series was quite helpful.

Now we just:

$ zig build

And run!

$ ./main list
$ ./main set x 12
$ ./main set xy 300
$ ./main list
x = 12
xy = 300
$ ./main get xy
300
$ ./main list xy
xy = 300

Not bad!