October 5, 2023

Go database driver overhead on insert-heavy workloads

The most popular SQLite and PostgreSQL database drivers in Go are (roughly) 20-76% slower than alternative Go drivers on insert-heavy benchmarks of mine. So if you are bulk-inserting data with Go (and potentially also bulk-retrieving data with Go), you may want to consider the driver carefully. And you may want to consider avoiding database/sql.

Some driver authors have noted and benchmarked issues with database/sql.

So it may be the case that database/sql is responsible for some of this overhead. And indeed the variations between drivers in this post will be demonstrated by using database/sql and avoiding it. This post won't specifically prove that the variation is due to the database/sql interface. But that doesn't change the premise.

Not covered in this post but something to consider: JetBrains has suggested that other frontends like sqlc, sqlx, and GORM do worse than database/sql.

This post is built on the workload, environment, libraries, and methodology in my databases-intuition repo on GitHub. See the repo for details that will help you reproduce or correct me.

INSERT workload

In this workload, the data is random and there are no indexes. Neither of these aspects matter for this post though because we're comparing behavior within the same database among different drivers. This was just a workload I already had.

Two different data sizes are tested:

  1. 10M rows with 16 columns, each column is 32 bytes
  2. 10M rows with 3 columns, each column is 8 bytes

Each test is run 10 times and we record median, standard deviation, min, max and throughput.

SQLite

Both variations presented here load 10M rows using a single prepared statement called for each row within a single transaction.

The most popular driver is mattn/go-sqlite3.

It is roughly 20-40% slower than another driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 56.53 ± 1.26s, Min: 55.05s, Max: 59.62s
Throughput: 176,893.65 ± 3,853.90 rows/s, Min: 167,719.97 rows/s, Max: 181,646.02 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 15.92 ± 0.25s, Min: 15.69s, Max: 16.67s
Throughput: 628,044.37 ± 9,703.92 rows/s, Min: 599,852.91 rows/s, Max: 637,435.60 rows/s

The other driver I tested is my own fork of bvinc/go-sqlite-lite called eatonphil/gosqlite. I forked it because it is unmaintained and I wanted to bring it up-to-date for tests like this.

10M Rows, 16 columns, each column 32 bytes:

Timing: 45.51 ± 0.70s, Min: 43.72s, Max: 45.93s
Throughput: 219,729.65 ± 3,447.56 rows/s, Min: 217,742.98 rows/s, Max: 228,711.51 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 10.44 ± 0.20s, Min: 10.02s, Max: 10.68s
Throughput: 957,939.60 ± 18,879.43 rows/s, Min: 936,114.60 rows/s, Max: 998,426.62 rows/s

PostgreSQL

Both variations presented use PostgreSQL's COPY FROM support. This is significantly faster for PostgreSQL than doing the prepared statement we do in SQLite. (Here are my results for doing prepared statement INSERTs in PostgreSQL if you are curious.)

The most popular PostgreSQL driver is lib/pq. The performance issues with lib/pq are well-known, and the repo itself is marked as no longer developed.

It is roughly 44-76% slower than an alternative driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 104.53 ± 2.40s, Min: 102.57s, Max: 110.08s
Throughput: 95,665.37 ± 2,129.25 rows/s, Min: 90,847.08 rows/s, Max: 97,490.96 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 8.16 ± 0.43s, Min: 7.44s, Max: 8.80s
Throughput: 1,225,986.47 ± 66,631.53 rows/s, Min: 1,136,581.82 rows/s, Max: 1,343,441.37 rows

The other driver I tested is jackc/pgx, without database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 46.54 ± 1.60s, Min: 44.09s, Max: 49.51s
Throughput: 214,869.42 ± 7,265.10 rows/s, Min: 201,991.37 rows/s, Max: 226,801.07 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 5.20 ± 0.44s, Min: 4.71s, Max: 5.96s
Throughput: 1,923,722.79 ± 156,820.46 rows/s, Min: 1,676,894.32 rows/s, Max: 2,124,966.60 rows/

The discrepancies here are even greater than with the different SQLite drivers.

Workloads with small resultset

I won't go into as much detail but if you're doing queries that don't return many rows, the difference between drivers is negligible.

See here for details.

Conclusion

If you are doing INSERT-heavy workloads, or you are processing large number of rows returned from your SQL database, you might want to try benchmarking the same workload with different drivers.

And specifically, there is likely no good reason to use lib/pq anymore for accessing PostgreSQL from Go. Just use jackc/pgx.