March 15, 2024

Zig, Rust, and other languages

Having worked a bit in Zig, Rust, Go and now C, I think there are a few common topics worth having a fresh conversation on: automatic memory management, the standard library, and explicit allocation.

Zig is not a mature language. But it has made enough useful choices for a number of companies to invest in it and run it in production. The useful choices make Zig worth talking about.

Go and Rust are mature languages. But they have both made questionable choices that seem worth talking about.

All of these languages are developed by highly intelligent folks I personally look up to. And your choice to use any one of these is certainly fine, whichever it is.

The positive and negative choices particular languages made, though, are worth talking about as we consider what a systems programming language 10 years from now would look like. Or how these languages themselves might evolve in the next 10 years.

My perspective is mostly building distributed databases. So the points that I bring up may have no relevance to the kind of work you do, and that's alright. Moreover, I'm already aware most of these opinions are not shared by the language maintainers, and that's ok too. I am not writing to convince anyone.

Automatic memory management

One of my bigger issues with Zig is that it doesn't support RAII. You can defer cleanup to the end of a block; and this is half of the problem. But only RAII will allow for smart pointers and automatic (not manual) reference counting. RAII is an excellent option to default to, but in Zig you aren't allowed to. In contrast, even C "supports" automatic cleanup (via compiler extensions).

But most of the time, arenas are fine. Postgres is written in C and memory is almost entirely managed through nested arenas (called "memory contexts") that get cleaned up when some subset of a task finishes, recursively. Zig has builtin support for arenas, which is great.

Standard library

It seems regrettable that some languages have been shipping smaller standard libraries. Smaller standard libraries seem to encourage users of the language to install more transitively-unvetted third-party libraries, which increases build time and build flakiness, and which increases bitrot over time as unnecessary breaking changes occur.

People have been making jokes about node_modules for a decade now, but this problem is just as bad in Rust codebases I've seen. And to a degree it happens in Java and Go as well, though their larger standard libraries allow you to get further without dependencies.

Zig has a good standard library, which may be Go and Java tier in a few years. But one goal of their package manager seemed to be to allow the standard library to be broken up; made smaller. For example, JSON support moving out of the standard library into a package. I don't know if that is actually the planned direction. I hope not.

Having a large standard library doesn't mean that the programmer shouldn't be able to swap out implementations easily as needed. But all that is required is for the standard library to define an interface along with the standard library implementation.

The small size of the standard library doesn't just affect developers using the language, it even encourages developers of the language itself to depend on libraries owned by individuals.

Take a look at the transitive dependencies of an official Node.js package like node-gyp. Is it really the ideal outcome of a small standard library to encourage dependence in official libraries on libraries owned by individuals, like env-paths, that haven't been modified in 3 years? 68 lines of code. Is it not safer at this point to vendor that code? i.e. copy the env-paths code into node-gyp.

Similarly, if you go looking for compression support in Rust, there's none in the standard library. But you may notice the flate2-rs repo under the official rust-lang GitHub namespace. If you look at its transitive dependencies: flate2-rs depends on (an individual's) miniz_oxide which depends on (an individual's) adler that hasn't been updated in 4 years. 300 lines of code including tests. Why not vendor this code? It's the habits a small standard library builds that seem to encourage everyone not to.

I don't mean these necessarily constitute a supply-chain risk. I'm not talking about left-pad. But the pattern is sort of clear. Even official packages may end up depending on external party packages, because the commitment to a small standard library meant omitting stuff like compression, checksums, and common OS paths.

It's a tradeoff and maybe makes the job of the standard library maintainer easier. But I don't think this is the ideal situation. Dependencies are useful but should be kept to a reasonable minimum.

Hopefully languages end up more like Go than like Rust in this regard.

Explicit allocation

When folk discuss the Zig standard library's pattern of requiring an allocator argument for every method that allocates, they often talk about the benefit of swapping out allocators or the benefit of being able to handle OOM failures.

Both of these seem pretty niche to me. For example, in Zig tests you are encouraged to pass around a debug allocator that tells you about memory leaks. But this doesn't seem too different from compiling a C project with a debug allocator or compiling with different sanitizers on and running tests against the binary produced. In both cases you mostly deal with allocators at a global level depending on the environment you're running the code in (production or tests).

The real benefit of explicit allocations to me is much more trivial. You basically can't code a method in Zig without acknowledging allocations.

This is particularly useful for hotpath code. Take an iterator for example. It has a new() method, a next() method, and a done() method. In most languages, it's basically impossible at the syntax or compiler-level to know if you are allocating in the next() method. You may know because you know the behavior of all the code in next() by heart. But that won't happen all the time.

Zig is practically alone in that if you write the next() method and and don't pass an allocator to any method in the next() body, nothing in that next() method will allocate.

In any other language it might not be until you run a profiler that you notice an allocation that should have been done once in new() accidentally ended up in next() instead.

On the other hand, for all the same reasons, writing Zig is kind of a pain because everything takes an allocator!

Explicit allocation is not intrinsic to Zig, the language. It is a convention that is prevalent in the standard library. There is still a global allocator and any user of Zig could decide to use the global allocator. At which point you've got implicit allocation. So explicit allocation as a convention isn't a perfect solution.

But it, by default, gives you a level of awareness of allocations you just can't get from typical Go or Rust or C code, depending on the project's practices. Perhaps it's possible to switch off the Go, Rust and C standard library and use one where all functions that allocate do require an allocator.

But explicitly passing allocators is still sort of a visual hack.

I think the ideal situation in the future will be that every language supports annotating blocks of code as must-not-allocate or something along those lines. Either the compiler will enforce this and fail if you seem to allocate in a block marked must-not-allocate, or it will panic during runtime so you can catch this in tests.

This would be useful beyond static programming languages. It would be as interesting to annotate blocks in JavaScript or Python as must-not-allocate too.

Otherwise the current state of things is that you'd normally configure this sort of thing at the global level. Saying "there must not be any allocations in this entire program" just doesn't seem as useful in general as being able to say "there must not be any allocations in this one block".

Optional, not required, allocator arguments

Rust has nascent support for passing an allocator to methods that allocate. But it's optional. From what I understand, C++ STL is like this too.

These are both super useful for programming extensions. And it's one of the reasons I think Zig makes a ton of sense for Postgres extensions specifically. Because it was only and always ever built for running in an environment with someone else's allocator.

Praise for Zig, Rust, and Go tooling

All three of these have really great first-party tooling including build system, package management, test runners and formatters. The idea that the language should provide a great environment to code in (end-to-end) makes things simpler and nicer for programmers.

Meandering non-conclusion

Use the language you want to use. Zig and Rust are both nice alternatives to writing vanilla C.

On the other hand, I've been pleasantly surprised writing Postgres C. How high level it is. It's almost a separate language since you're often dealing with user-facing constructs, like Postgres's Datum objects which represent what you might think of as a cell in a Postgres database. And you can use all the same functions provided for Postgres SQL for working with Datums, but from C.

I've also been able work a bit on Postgres extensions in Rust with pgrx lately, which I hope to write about soon. And when I saw pgzx for writing Postgres extensions in Zig I was excited to spend some time with that too.