I mostly programmed in Go the last few years. So every time I wanted an embedded key-value database, I reached for Cockroach's Pebble.
Pebble is great for Go programming but Go does not embed well into other languages. Pebble was inspired by RocksDB (and its predecessor, LevelDB). Both were written in C++ which can more easily be embedded into any language with a C foreign function interface. Pebble also has some interesting limitations that RocksDB does not, transactions for example.
So I've been wanting to get familiar with RocksDB. And I've been learning Zig, so I set out to write a simple Zig program that embeds RocksDB. (If you see weird things in my Zig code and have suggestions, send me a note!)
This post is going to be a mix of RocksDB explanations and Zig explanations. By the end we'll have a simple CLI over a durable store that is able to set keys, get keys, and list all key-value pairs (optionally filtered on a key prefix).
$ ./kv set x 1
$ ./kv get x
1
$ ./kv set y 22
$ ./kv list x
x = 1
$ ./kv list y
y = 22
$ ./kv list
x = 1
y = 22
Basic stuff!
You can find the code for this post in the rocksdb.zig file on Github. To simplify things, this code is only going to work on Linux. And it will require Zig 0.10.x.
RocksDB
RocksDB is written in C++. But most languages cannot interface with C++. (Zig cannot either, as far as I understand). So most C++ libraries expose a C API that is easier for other programming languages to interact with. RocksDB does this. Great!
Now RocksDB's C++ documentation is phenomenal, especially among C++ libraries. But if there is documentation for the C API, I couldn't find it. Instead you must trawl through the C header file, the C wrapper implementation, and the C tests.
There was also a great gist showing a minimal RocksDB C example. But it didn't cover the iterator API for fetching a range of keys with a prefix. But with the C tests file I was able to figure it out, I think.
Let's dig in!
Creating, opening and closing a RocksDB database
First we need to import the C header so that Zig can compile-time verify the foreign functions we call. We'll also import the standard library that we'll use later.
Aside from build.zig
below, all code should be in main.zig
.
const std = @import("std");
const rdb = @cImport(@cInclude("rocksdb/c.h"));
Don't read anything into the `@` other than that this is a compiler builtin. It's used for imports, casting, and other metaprogramming.
Now we can build our wrapper. It will be a Zig struct that contains a pointer to the RocksDB instance.
const RocksDB = struct {
db: *rdb.rocksdb_t,
To open a database we'll call rocksdb_open()
with a directory name
for RocksDB to store data. And we'll tell RocksDB to create the
database if it doesn't already exist.
fn open(dir: []const u8) struct { val: ?RocksDB, err: ?[]u8 } {
var options: ?*rdb.rocksdb_options_t = rdb.rocksdb_options_create();
rdb.rocksdb_options_set_create_if_missing(options, 1);
var err: ?[*:0]u8 = null;
var db: ?*rdb.rocksdb_t = rdb.rocksdb_open(options, dir.ptr, &err);
if (err != null) {
return .{ .val = null, .err = std.mem.span(err) };
}
return .{ .val = RocksDB{ .db = db.? }, .err = null };
}
Finally, we close with rocksdb_close()
:
fn close(self: RocksDB) void {
rdb.rocksdb_close(self.db);
}
The RocksDB aspect of this is easy. But there's a bunch of Zig-specific details I should (try to) explain.
Return types
Zig has a cool
error
type. try
/catch
in Zig work only with this error
type and
subsets of it you can create. error
is an enum. But Zig error
s are not
ML-style tagged unions (yet?). That is, you cannot both return an
error and some dynamic information about the error. So the usefulness
of error
is limited. It mostly only works if the errors are a finite
set without dynamic aspects.
Zig also doesn't have multiple return values. But it does have
optional types (denoted with ?
) and it has anonymous structs.
So we can do a slightly less safe, but more informational, error type by returning a struct with an optional success value and an optional error.
That's how we get the return type struct { val: ?RocksDB, err: ?[]u8 }
.
This is not very different from Go, certainly no less safe, and I'm probably biased to use this as a Go programmer.
Felix Queißner points out to me that there are tagged unions in Zig
that would be more safe here. Instead of struct { val:
?RocksDB, err: ?[]u8 }
I could do union(enum) { val:
RocksDB, err: []u8 }
. When I get a chance to play with that
syntax I'll modify this post.
Optional pointers
The next thing you may notice is ?*rdb.rocksdb_options_t
and
?*rdb.rocksdb_t
. This is to work with Zig's type system. Zig expects
that pointers are not null. By adding ?
we are telling Zig that this
value can be null. That way the Zig type system will force us to
handle the null condition if we try to access fields on the value.
In the options case, it doesn't really matter if the result is null
or not. In the database case, we handle null-ness it by checking the
error value if (err) |errStr|
. If this condition is not met, we
know the database is not null. So we use db.?
to assert and return a
value that, in the type system, is not null.
Zig strings, C strings
Another thing you may notice is var err:
?[*:0]u8 = null;
. Zig strings are expressed as byte arrays or byte
slices. []u8
and []const u8
are slices that keep track of the
number of items. [*:0]u8
is not a byte slice. It has no length and
is only null-delimited. To go from the null-delimited array that the C
API returns to the []u8
(slice that contains length) in our
function's return signature we use
std.mem.span
.
This StackOverflow post was useful for understanding this.
Structs
Anonymous structs in Zig are prefixed with a .
. And all struct
fields, anonymous or not, are prefixed with .
.
So .{.x = 1}
instantiates an anonymous struct that has one field
x
.
Struct fields in Zig cannot not be instantiated, even if they are
nullable. And when you initialize a nullable value you don't need to
wrap it in a Some()
like you might do in an ML.
One thing I found surprising about Zig anonymous structs is that instances of the anonymous type are created per function and two anonymous structs that are structurally identical but referenced in different functions are not actually type-equal.
So this doesn't compile:
$ cat test.zig
fn doA() struct { y: u8 } {
return .{.y = 1};
}
fn doB() struct { y: u8 } {
return doA();
}
pub fn main() !void {
_ = doB();
}
$ zig build-exe test.zig
test.zig:5:15: error: expected type 'test.doB__struct_2890', found 'test.doA__struct_3878'
return doA();
~~~^~
test.zig:1:10: note: struct declared here
fn doA() struct { y: u8 } {
^~~~~~~~~~~~~~~~
test.zig:4:10: note: struct declared here
fn doB() struct { y: u8 } {
^~~~~~~~~~~~~~~~
test.zig:4:10: note: function return type declared here
fn doB() struct { y: u8 } {
^~~~~~~~~~~~~~~~
referenced by:
main: test.zig:8:9
callMain: /whatever/lib/std/start.zig:606:32
remaining reference traces hidden; use '-freference-trace' to see all reference traces
You would need to instantiate a new anonymous struct in the second function.
$ cat test.zig
fn doA() struct { y: u8 } {
return .{.y = 1};
}
fn doB() struct { y: u8 } {
return .{ .y = doA().y };
}
pub fn main() !void {
_ = doB();
}
Uniform function call syntax
Zig seems to support something like uniform function call
syntax
where you can either call a function with arguments or you can omit
the first argument by prefixing the function call with
firstargument.
. I.e. x.add(y)
and add(x, y)
.
In the case of this code it would be RocksDB.close(db)
vs
db.close()
assuming db
is an instance of the RocksDB
struct.
Like Python, the use of self
as the name of this first parameter of
a struct's methods is purely convention. You can call it whatever.
The point is that we always expect the user to var db = RocksDB.open()
for
open()
and allow the user to do db.close()
for close()
.
Let's move on!
Setting a key-value pair
We set a pair by calling rocksdb_put
with the database instance,
some options (we'll leave to defaults), and the key and value strings
as C strings.
fn set(self: RocksDB, key: [:0]const u8, value: [:0]const u8) ?[]u8 {
var writeOptions = rdb.rocksdb_writeoptions_create();
var err: ?[*:0]u8 = null;
rdb.rocksdb_put(
self.db,
writeOptions,
key.ptr,
key.len,
value.ptr,
value.len,
&err,
);
if (err) |errStr| {
return std.mem.span(errStr);
}
return null;
}
The only special Zig thing is there is key.ptr
to satisfy the Zig /
C type system. The type signature key: [:0]const u8
and value:
[:0]const u8
makes sure that the user passes in a null-delimited
byte slice, which is what the RocksDB API expects.
Getting a value from a key
We set a pair by calling rocksdb_get
with the database instance,
some options (we'll again leave to defaults), and the key as a C
string.
fn get(self: RocksDB, key: [:0]const u8) struct { val: ?[]u8, err: ?[]u8 } {
var readOptions = rdb.rocksdb_readoptions_create();
var valueLength: usize = 0;
var err: ?[*:0]u8 = null;
var v = rdb.rocksdb_get(
self.db,
readOptions,
key.ptr,
key.len,
&valueLength,
&err,
);
if (err) |errStr| {
return .{ .val = null, .err = std.mem.span(errStr) };
}
if (v == 0) {
return .{ .val = null, .err = null };
}
return .{ .val = v[0..valueLength], .err = null };
}
One thing in there to call out is that we can go from a null-delimited
value v
to a standard Zig slice []u8
by slicing from 0
to the
length of the value returned by the C API.
Also, rocksdb_get
is only used for getting a single key-value
pair. We'll handle key-value pair iteration next.
Iterating over key-value pairs
The basic structure of RocksDB's iterator API is that you first create
an iterator instance with rocksdb_create_iterator()
. Then you either
rocksdb_iter_seek_to_first()
or rocksdb_iter_seek()
(with a
prefix) to get the iterator ready. Then you get the current iterator
entry's key with rocksdb_iter_key()
and value with
rocksdb_iter_value()
. You move on to the next entry in the iterator
with rocksdb_iter_next()
and check that the current iterator value
is valid with rocksdb_iter_valid()
. When the iterator is no longer
valid, or if you want to stop iterating early, you call
rocksdb_iter_destroy()
.
But we'd like to present a Zig-only interface to users of the
RocksDB
Zig struct. So we'll create a RocksDB.iter()
function that
returns a RocksDB.Iter
with an RocksDB.Iter.next()
function that
will return an optional RocksDB.IterEntry
.
We'll start backwards with that RocksDB.Iter
struct.
RocksDB.Iter
Each iterator instance will store a pointer to a RocksDB iterator instance. It will store the prefix requested (which is allowed to be an empty string). If the prefix is set though, we'll only iterate while the iterator key has the requested prefix.
const IterEntry = struct {
key: []const u8,
value: []const u8,
};
const Iter = struct {
iter: *rdb.rocksdb_iterator_t,
first: bool,
prefix: []const u8,
fn next(self: *Iter) ?IterEntry {
if (!self.first) {
rdb.rocksdb_iter_next(self.iter);
}
self.first = false;
if (rdb.rocksdb_iter_valid(self.iter) != 1) {
return null;
}
var keySize: usize = 0;
var key = rdb.rocksdb_iter_key(self.iter, &keySize);
// Make sure key is still within the prefix
if (self.prefix.len > 0) {
if (self.prefix.len > keySize or
!std.mem.eql(u8, key[0..self.prefix.len], self.prefix))
{
return null;
}
}
var valueSize: usize = 0;
var value = rdb.rocksdb_iter_value(self.iter, &valueSize);
return IterEntry{
.key = key[0..keySize],
.value = value[0..valueSize],
};
}
Finally we'll wrap the rocksdb_iter_destroy()
method:
fn close(self: Iter) void {
rdb.rocksdb_iter_destroy(self.iter);
}
};
RocksDB.iter()
Now we can write the function that creates the RocksDB.Iter
. As
previously mentioned we must first instantiate the RocksDB iterator
and then seek
to either the first entry if the user doesn't request
a prefix. Or if the user requests a prefix, we seek
until that
prefix.
fn iter(self: RocksDB, prefix: [:0]const u8) struct { val: ?Iter, err: ?[]const u8 } {
var readOptions = rdb.rocksdb_readoptions_create();
var it = Iter{
.iter = undefined,
.first = true,
.prefix = prefix,
};
if (rdb.rocksdb_create_iterator(self.db, readOptions)) |i| {
it.iter = i;
} else {
return .{ .val = null, .err = "Could not create iterator" };
}
if (prefix.len > 0) {
rdb.rocksdb_iter_seek(
it.iter,
prefix.ptr,
prefix.len,
);
} else {
rdb.rocksdb_iter_seek_to_first(it.iter);
}
return .{ .val = it, .err = null };
}
};
And now we're done a basic Zig wrapper for the RocksDB API!
main
Next we write a simple command-line entrypoint that uses the RocksDB wrapper we built. This is not the prettiest code but it gets the job done.
pub fn main() !void {
var openRes = RocksDB.open("/tmp/db");
if (openRes.err) |err| {
std.debug.print("Failed to open: {s}.\n", .{err});
}
var db = openRes.val.?;
defer db.close();
var args = std.process.args();
_ = args.next();
var key: [:0]const u8 = "";
var value: [:0]const u8 = "";
var command = "get";
while (args.next()) |arg| {
if (std.mem.eql(u8, arg, "set")) {
command = "set";
key = args.next().?;
value = args.next().?;
} else if (std.mem.eql(u8, arg, "get")) {
command = "get";
key = args.next().?;
} else if (std.mem.eql(u8, arg, "list")) {
command = "lst";
if (args.next()) |argNext| {
key = argNext;
}
} else {
std.debug.print("Must specify command (get, set, or list). Got: '{s}'.\n", .{arg});
return;
}
}
if (std.mem.eql(u8, command, "set")) {
var setErr = db.set(key, value);
if (setErr) |err| {
std.debug.print("Error setting key: {s}.\n", .{err});
return;
}
} else if (std.mem.eql(u8, command, "get")) {
var getRes = db.get(key);
if (getRes.err) |err| {
std.debug.print("Error getting key: {s}.\n", .{err});
return;
}
if (getRes.val) |v| {
std.debug.print("{s}\n", .{v});
} else {
std.debug.print("Key not found.\n", .{});
}
} else {
var prefix = key;
var iterRes = db.iter(prefix);
if (iterRes.err) |err| {
std.debug.print("Error getting iterator: {s}.\n", .{err});
}
var iter = iterRes.val.?;
defer iter.close();
while (iter.next()) |entry| {
std.debug.print("{s} = {s}\n", .{ entry.key, entry.value });
}
}
}
Notably, the main
function must be marked pub
. The struct and
struct methods we wrote would need to be marked pub
if we wanted
them accessible from other files. But since this is a single file,
pub
doesn't matter. Except for main
.
Now we can get into building.
Building
First we need to compile the RocksDB library. To do this we simply
git clone
RocksDB and run make shared_libs
.
Compiling RocksDB
$ git clone https://github.com/facebook/rocksdb
$ ( cd rocksdb && make shared_lib -j8 )
This may take a while, sorry.
build.zig
Next we need to write a build.zig
script that tells Zig about this
external library. This was one of the harder parts of the process, but
building and linking against foreign libraries is almost always hard.
$ cat build.zig
const version = @import("builtin").zig_version;
const std = @import("std");
pub fn build(b: *std.build.Builder) void {
const exe = b.addExecutable("main", "main.zig");
exe.linkLibC();
exe.linkSystemLibraryName("rocksdb");
exe.addLibraryPath("./rocksdb");
exe.addIncludePath("./rocksdb/include");
exe.setOutputDir(".");
exe.install();
}
Felix Queißner's zig build explained series was quite helpful.
Now we just:
$ zig build
And run!
$ ./main list
$ ./main set x 12
$ ./main set xy 300
$ ./main list
x = 12
xy = 300
$ ./main get xy
300
$ ./main list xy
xy = 300
Not bad!
I wrote a new post on using RocksDB with Zig! There weren't a lot of good examples of the C API and it was good practice for learning Zig.
— Phil Eaton (@phil_eaton) October 31, 2022
Also sets me up for integrating it in a (WIP) port of my toy SQL database from Go to Zig. (This time with storage!)https://t.co/zquojV974G pic.twitter.com/gtAsB6Wrhi