July 10, 2022

Implementing a simple jq clone in Go, and basics of Go memory profiling

In this post we'll build a basic jq clone in Go. It will only be able to pull a single path out of each object it reads. It won't be able to do filters, mapping, etc.

$ cat large-file.json | head -n2 | ./jqgo '.repo.url'
"https://api.github.com/repos/petroav/6.828"
"https://api.github.com/repos/rspt/rspt-theme"

We'll start by building a "control" implementation that uses Go's builtin JSON library with a JSON path tool on top.

Then we'll implement a basic path-aware JSON parser in 600 lines of Go. It's going to use a technique (that may have a better name but) I call "partial parsing" or "fuzzy parsing" where we fully parse what we care about and only sort of parse the rest.

Why partial parsing? There are two general reasons. One is to use less memory than parsers that must always turn all of a text into an object in your language. The other is for when the language has complexities you don't want or need to deal with. We'll basically have to deal with all the complexities of JSON so this post is about the former reason: using less memory. I've written about a case for the second reason though in building a simple, fast SCSS implementation.

This partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at previous articles I've written about parsing JSON.

Once we get this partial parser working we'll turn to Go's builtin profiler to find what we can do to make it faster.

All code for this post is available on Github.

Machine specs, versions

Since we're going to be doing some rudimentary comparisons of performance, here are my details. I am running everything on a dedicated server, OVH Rise-1.

  • RAM: 64 GB DDR4 ECC 2,133 MHz
  • Disk: 2x450 GB SSD NVMe in Soft RAID
  • Processor: Intel Xeon E3-1230v6 - 4c/8t - 3.5 GHz/3.9 GHz

And relevant versions:

$ jq --version
jq-1.6
$ go version
go version go1.18 linux/amd64
$ uname -a
Linux phil 5.18.10-100.fc35.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 7 17:41:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Now buckle up!

jq using Go's builtin JSON library

This is a very simple program. We just parse JSON data from stdin in a loop. And after parsing each time we'll call a extractValueAtPath function to grab the value at the path the user asks for.

To keep our path "parser" very simple we'll treat array access the same as object access. So we'll look for x.0 instead of x[0], unlike jq.

package main

import (
    "encoding/json"
    "io"
    "log"
    "os"
    "strconv"
    "strings"
)

func extractValueAtPath(a map[string]any, path []string) (any, error) {
    // TODO
}

func main() {
    path := strings.Split(os.Args[1], ".")
    if path[0] == "" {
        path = path[1:]
    }

    dec := json.NewDecoder(os.Stdin)
    var a map[string]any

    enc := json.NewEncoder(os.Stdout)

    for {
        err := dec.Decode(&a)
        if err == io.EOF {
            break
        }

        if err != nil {
            log.Fatal(err)
        }

        v, err := extractValueAtPath(a, path)
        if err != nil {
            log.Fatal(err)
        }

        err = enc.Encode(v)
        if err != nil {
            log.Fatal(err)
        }
    }
}

Then we implement the extractValueAtPath function itself, entering into JSON arrays and objects until we reach the end of the path.

func extractValueAtPath(a map[string]any, path []string) (any, error) {
    if len(path) == 0 {
        return nil, nil
    }

    var v any = a

    for _, part := range path {
        if arr, ok := v.([]any); ok {
            n, err := strconv.Atoi(part)
            if err != nil {
                return nil, err
            }

            v = arr[n]
            continue
        }

        m, ok := v.(map[string]any)
        if !ok {
            // Path into a non-map
            return nil, nil
        }

        v, ok = m[part]
        if !ok {
            // Path does not exist
            return nil, nil
        }
    }

    return v, nil
}

Alright, let's give it a go module and build and run it!

$ go mod init control
$ go mod tidy
$ go build
# Grab a test file
$ curl https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json | jq -c '.[]' > large-file.json
$ cat large-file.json | head -n2 | ./control '.repo.url'
"https://api.github.com/repos/petroav/6.828"
"https://api.github.com/repos/rspt/rspt-theme"

Sweet. Now let's make sure it produces the same thing as jq.

$ cat large-file.json | ./control '.repo.url' > control.test
$ cat large-file.json | jq '.repo.url' > jq.test
$ diff jq.test control.test
$ echo $?
0

Great! It's working for a basic query. Let's see how it performs.

$ hyperfine \
  "cat large-file.json | ./control '.repo.url' > control.test" \
  "cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control '.repo.url' > control.test
  Time (mean ± σ):     310.0 ms ±  14.4 ms    [User: 296.2 ms, System: 49.3 ms]
  Range (min  max):   296.1 ms  344.9 ms    10 runs

Benchmark 2: cat large-file.json | jq '.repo.url' > jq.test
  Time (mean ± σ):     355.8 ms ±   1.1 ms    [User: 348.8 ms, System: 27.7 ms]
  Range (min  max):   354.8 ms  358.5 ms    10 runs

Summary
  'cat large-file.json | ./control '.repo.url' > control.test' ran
    1.15 ± 0.05 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'

Now that's surprising! This naive implementation in Go is a bit faster than standard jq. But our implementation supports a heck of a lot less than jq. So this benchmark on its own isn't incredibly meaningful.

However, it's a good base for comparing to our next implementation.

Astute readers may notice that this version doesn't use a buffered reader from stdin, while the next version will. I tried this version with and without wrapping stdin in a buffered reader but it didn't make a meaningful difference. It might be because Go's JSON decoder does its own buffering. I'm not sure.

Let's do the fun implementation.

Partial parsing

Unlike a typical handwritten parser this partial parser is going to contain almost two parsers. One parser will care exactly about the structure of JSON. The other parser will only care about reading past the current value (whether it be a number or string or array or object, etc.) The path we pass to the parser will be used to decide whether each value should be fully parsed or partially parsed.

I'll reiterate: this partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at previous articles I've written about parsing JSON.

The shell of this partial parser is going to look similar to the shell of the first parser.

package main

import (
    "bufio"
    "encoding/json"
    "fmt"
    "io"
    "log"
    "os"
    "strconv"
    "strings"
)

type jsonReader struct {
    read []byte
}

... TO IMPLEMENT ...

func main() {
    path := strings.Split(os.Args[1], ".")
    if path[0] == "" {
        path = path[1:]
    }

    b := bufio.NewReader(os.Stdin)
    enc := json.NewEncoder(os.Stdout)

    var jr jsonReader
    var val any
    var err error

    for {
        jr.reset()

        val, err = jr.extractDataFromJsonPath(b, path)
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Println("Read", string(jr.read))
            log.Fatalln(err)
        }

        err = enc.Encode(val)
        if err != nil {
            log.Fatalln(err)
        }
    }
}

Except instead of using the builtin JSON parser we'll call our own extractDataFromJsonPath function that handles parsing and extraction all at once.

Before doing that we'll add a few helper functions. The first one grabs a byte from a reader and stores the read byte locally (so we can print out all read bytes if the program fails).

type jsonReader struct {
    read []byte
}

func (jr *jsonReader) readByte(r *bufio.Reader) (byte, error) {
    c, err := r.ReadByte()
    if err != nil {
        return byte(0), err
    }

    jr.read = append(jr.read, c)

    return c, nil
}

The reset member zeroes out the read bytes and gets called before each object is parsed in the main main loop.

func (jr *jsonReader) reset() {
    jr.read = nil
}

Now let's get into extractDataFromJsonPath.

extractDataFromJsonPath

This is the real parser. It expects a JSON object and fully parses the object, almost.

func (jr *jsonReader) extractDataFromJsonPath(r *bufio.Reader, path []string) (any, error) {
    if len(path) == 0 {
        return nil, nil
    }

    err := jr.eatWhitespace(r)
    if err != nil {
        return nil, err
    }

    b, err := jr.readByte(r)
    if err != nil {
        return nil, err
    }

    // Make sure we're actually going into an object
    if b != '{' {
        return nil, fmt.Errorf("Expected opening curly brace, got: '%s'", string(b))
    }

    i := -1

    var result any
    for {
        i++

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

        bs, err := r.Peek(1)
        if err != nil {
            return nil, err
        }
        b := bs[0]

        // We found the end of the object
        if b == '}' {
            r.Discard(1)
            break
        }

        // Key-value pairs must be separated by commas
        if i > 0 {
            if b == ',' {
                r.Discard(1)
            } else {
                return nil, fmt.Errorf("Expected comma between key-value pairs, got: '%s'", string(b))
            }
        }

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

        // Grab the key
        s, err := jr.expectString(r)
        if err != nil {
            return nil, err
        }

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

        // Find a colon separating key from value
        b, err = jr.readByte(r)
        if err != nil {
            return nil, err
        }

        if b != ':' {
            return nil, fmt.Errorf("Expected colon, got: '%s'", string(b))
        }

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

Up to this point it looks like any old handwritten parser. There are a few helpers in there (eatWhitespace, expectString) we'll implement shortly.

But once we see each key and are ready to look for a value we can decide if we need to fully parse the value (if the path goes into this key) or if we can partially parse the value (because the path does not go into this key).

        // If the key is not the start of this path, skip past this value
        if path[0] != s {
            err = jr.eatValue(r)
            if err != nil {
                return nil, err
            }

            continue
        }

        // Otherwise this is a path we want, grab the value
        result, err = jr.expectValue(r, path[1:])
        if err != nil {
            return nil, err
        }
    }

    return result, nil
}

And that's it! The core parsing loop is done. The meat now becomes 1) the eatValue function that partially parses JSON and 2) the expectValue function that either encounters a scalar value and returns it or recursively calls extractDataFromJsonPath to enter some new object.

Notes on helper naming

There are three main kinds of helpers you'll see. expectX helpers like expectString will return early with an error if they fail to find what they're looking for. eatX helpers like eatWhitespace will not return any value and will only move the read cursor forward. And tryX helpers like tryNumber will do the same thing as expectString but return an additional boolean argument. So the caller can decide whether or not to make other attempts at parsing.

But first let's fill in the two helpers we skipped. First off, eatWhitespace.

eatWhitespace

This function peeks and reads bytes while the bytes are whitespace.

func (jr *jsonReader) eatWhitespace(r *bufio.Reader) error {
    for {
        bs, err := r.Peek(1)
        if err != nil {
            return err
        }
        b := bs[0]

        isWhitespace := b == ' ' ||
            b == '\n' ||
            b == '\t' ||
            b == '\r'
        if !isWhitespace {
            return nil
        }

        r.Discard(1)
    }
}

That's it! Next we need to fill in expectString.

expectString

This is a standard handwritten parser helper that looks for a double quote and keeps collecting bytes until it finds an ending double quote that is not escaped.

func (jr *jsonReader) expectString(r *bufio.Reader) (string, error) {
    var s []byte

    err := jr.eatWhitespace(r)
    if err != nil {
        return "", err
    }

    // Look for opening quote
    b, err := jr.readByte(r)
    if err != nil {
        return "", err
    }

    if b != '"' {
        return "", fmt.Errorf("Expected double quote to start string, got: '%s'", string(b))
    }

    var prev byte
    for {
        b, err := jr.readByte(r)
        if err != nil {
            return "", err
        }

        if b == '\\' && prev == '\\' {
            // Just skip
            prev = byte(0)
            continue
        } else if b == '"' {
            // Overwrite the escaped double quote
            if prev == '\\' {
                s[len(s)-1] = '"'
            } else {
                // Otherwise it's the actual end
                break
            }
        }

        s = append(s, b)
        prev = b
    }

    return string(s), nil
}

Standard stuff! Now let's get back to those meaty functions we introduced before, starting with expectValue.

expectValue

This function is called by extractDataFromJsonPath when it wants to fully parse a value.

If we see a left curly brace, we call extractDataFromJsonPath with it.

func (jr *jsonReader) expectValue(r *bufio.Reader, path []string) (any, error) {
    bs, err := r.Peek(1)
    if err != nil {
        return nil, err
    }
    c := bs[0]

    if c == '{' {
        return jr.extractDataFromJsonPath(r, path)

Otherwise if we see a left bracket we call a new helper extractArrayDataFromJsonPath which will be almost identical to extractDataFromJsonPath but for parsing array syntax.

    } else if c == '[' {
        return jr.extractArrayDataFromJsonPath(r, path)
    }

If the value we're trying to parse isn't an array or object and there's more of a path then we have to return null because we can't enter into a scalar value.

    // Can't go any further into a path

    if len(path) != 0 {
        // Reached the end of this object but more of
        // the path remains. So this object doesn't
        // contain this path.
        return nil, nil
    }

Then we try to parse a scalar (numbers, strings, booleans, null) and ultimately return an error if nothing worked.

    ok, val, err := jr.tryScalar(r)
    if err != nil {
        return nil, err
    }
    if !ok {
        return nil, fmt.Errorf("Expected scalar, got: '%s'", string(c))
    }

    return val, err
}

Let's implement tryScalar and its dependencies now. And we'll come back to extractArrayDataFromJsonPath afterward.

tryScalar

The tryScalar is similar to expectValue. It's called tryScalar because it's allowed to fail.

We peek at the first byte and switch on a dedicated parsing helper based on it.

func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) {
    bs, err := r.Peek(1)
    if err != nil {
        return false, nil, err
    }
    c := bs[0]

    if c == '"' {
        val, err := jr.expectString(r)
        return true, string(val), err
    } else if c == 't' {
        val, err := jr.expectIdentifier(r, "true", true)
        return true, val, err
    } else if c == 'f' {
        val, err := jr.expectIdentifier(r, "false", false)
        return true, val, err
    } else if c == 'n' {
        val, err := jr.expectIdentifier(r, "null", nil)
        return true, val, err
    }

    return jr.tryNumber(r)
}

This passes control flow to two new functions, expectIdentifier and tryNumber. Let's do expectIdentifier next.

expectIdentifier

This function tries to match the reader on a string passed to it.

func (jr *jsonReader) expectIdentifier(r *bufio.Reader, ident string, value any) (any, error) {
    var s []byte

    for i := 0; i < len(ident); i++ {
        b, err := r.ReadByte()
        if err != nil {
            return nil, err
        }

        s = append(s, b)
    }

    if string(s) == ident {
        return value, nil
    }

    return nil, fmt.Errorf("Unknown value: '%s'", string(s))
}

Thanks Michael Lynch for pointing out in an earlier version that expectIdentifier does not need to Peek/Discard but can just ReadByte instead.

tryNumber

This function tries to parse a number. We'll do a very lazy number parser that will most likely allow all valid numbers. Internally we'll call json.Unmarshal on the bytes we build up to do the conversion itself.

func (jr *jsonReader) tryNumber(r *bufio.Reader) (bool, any, error) {
    var number []byte

    // Loop trying to find all number-like characters in a row
    for {
        bs, err := r.Peek(1)
        if err != nil {
            return false, nil, err
        }
        c := bs[0]

        isNumberCharacter := (c >= '0' && c <= '9') || c == 'e' || c == '-'
        if !isNumberCharacter {
            break
        }

        number = append(number, c)

        r.Discard(1)
    }

    if len(number) == 0 {
        return false, nil, nil
    }

    var n float64
    err := json.Unmarshal(number, &n)
    return true, n, err
}

If we can't find a number, that's ok. We'll just say so in the first argument by returning false.

Outstanding functions

Ok we've come a while building out helper functions. The last two remaining helpers are extractArrayDataFromJsonPath and eatValue. Let's finish up these real parser functions before getting to eatValue, the primary partial parsing function.

extractArrayDataFromJsonPath

This function is almost identical to extractDataFromJsonPath but rather than parsing key-value pairs inside curly braces it parses values inside brackets.

func (jr *jsonReader) extractArrayDataFromJsonPath(r *bufio.Reader, path []string) (any, error) {
    // Path inside an array must be an integer
    n, err := strconv.Atoi(string(path[0]))
    if err != nil {
        return nil, err
    }

    // Look for opening bracket. Make sure we're in an array
    b, err := jr.readByte(r)
    if err != nil {
        return nil, err
    }

    if b != '[' {
        return nil, fmt.Errorf("Expected opening bracket, got: '%s'", string(b))
    }

    var result any
    i := -1
    for {
        i++

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

        bs, err := r.Peek(1)
        if err != nil {
            return nil, err
        }
        b := bs[0]

        // Found closing bracket, exit the array
        if b == ']' {
            r.Discard(1)
            break
        }

        // Array values must be separated by a comma
        if i > 0 {
            if b == ',' {
                r.Discard(1)
            } else {
                return nil, fmt.Errorf("Expected comma between key-value pairs, got: '%s'", string(b))
            }
        }

        err = jr.eatWhitespace(r)
        if err != nil {
            return nil, err
        }

Just like extractDataFromJsonPath it either calls eatValue or expectValue depending on whether the current index matches the requested path.

        // If the key is not the start of this path, skip past this value
        if i != n {
            err = jr.eatValue(r)
            if err != nil {
                return nil, err
            }

            continue
        }

        result, err = jr.expectValue(r, path[1:])
        if err != nil {
            return nil, err
        }
    }

    return result, nil
}

That's it for full parser functions! Let's do the partial parser, eatValue.

eatValue

This function is simpler than the full parser functions we wrote before.

First off it looks for the simple case where the value is a scalar.

func (jr *jsonReader) eatValue(r *bufio.Reader) error {
    var stack []byte

    inString := false
    var prev byte

    err := jr.eatWhitespace(r)
    if err != nil {
        return err
    }

    ok, _, err := jr.tryScalar(r)
    if err != nil {
        return err
    }

    // It was a scalar, we're done!
    if ok {
        return nil
    }

All it does is read until the value ends.

If the value is not a scalar though we need to read past complete JSON arrays and/or objects.

To do this we'll simply read through bytes, monitoring a stack of open and close braces and brackets. If we enter a string we'll skip all bytes inside the string until the string ends.

    // Otherwise it's an array or object
    first := true

    for first || len(stack) > 0 {
        first = false

        bs, err := r.Peek(1)
        if err != nil {
            return err
        }
        b := bs[0]

        if inString {
            if b == '"' && prev != '\\' {
                inString = false
            }

            // Two \\-es cancel eachother out
            if b == '\\' && prev == '\\' {
                prev = byte(0)
            } else {
                prev = b
            }

            r.Discard(1)
            continue
        }

        switch b {
        case '[':
            stack = append(stack, b)
        case ']':
            c := stack[len(stack)-1]
            stack = stack[:len(stack)-1]
            if c != '[' {
                return fmt.Errorf("Unexpected end of array: '%s'", string(c))
            }
        case '{':
            stack = append(stack, b)
        case '}':
            c := stack[len(stack)-1]
            stack = stack[:len(stack)-1]
            if c != '{' {
                return fmt.Errorf("Unexpected end of object: '%s'", string(c))
            }
        case '"':
            inString = true
            // Closing quote case handled elsewhere, above
        }

        r.Discard(1)
        prev = b
    }

    return nil
}

And we're finally done the first pass of the path-aware jq implementation.

Build, test, benchmark

Let's give it a go module, build and test it.

$ go mod init jqgo
$ go build
$ curl https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json | jq -c '.[]' > large-file.json
$ cat large-file.json | ./jqgo '.repo.url' > jqgo.test
$ cat large-file.json | jq '.repo.url' > jq.test
$ diff jq.test jqgo.test
$ echo $?
0

Great! :) Let's benchmark it against jq and the control implementation.

$ hyperfine --warmup 2 \
  "cat large-file.json | ./control/control '.repo.url' > control.test" \
  "cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
  "cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
  Time (mean ± σ):     302.0 ms ±   3.4 ms    [User: 283.7 ms, System: 53.1 ms]
  Range (min  max):   297.4 ms  309.0 ms    10 runs

Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
  Time (mean ± σ):     258.8 ms ±   2.2 ms    [User: 230.3 ms, System: 47.6 ms]
  Range (min  max):   256.3 ms  262.6 ms    11 runs

Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
  Time (mean ± σ):     357.6 ms ±   2.9 ms    [User: 350.0 ms, System: 28.3 ms]
  Range (min  max):   355.0 ms  362.9 ms    10 runs

Summary
  'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
    1.17 ± 0.02 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
    1.38 ± 0.02 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'

Now to my surprise we're already beating the non-path-aware control implementation! When I first wrote the path-aware version, it was slower than the control. So I had to start performance profiling. For this blog post I tried to remake the slowest variation I could remember but I couldn't get it slower than this.

That said, the best version was faster than this so I can demonstrate the process of profiling to improve performance.

Let's dig in. :)

Profiling in Go

There are various ways to enable profiling in Go. One way some people recommend is through the builtin benchmark support in go test. I don't really like this method though. I prefer to use pkg/profile manually in main.go.

@@ -9,6 +9,8 @@
        "os"
        "strconv"
        "strings"
+
+       "github.com/pkg/profile"
 )

 type jsonReader struct {
@@ -450,6 +452,7 @@
 }

 func main() {
+       defer profile.Start().Stop()
        path := strings.Split(os.Args[1], ".")
        if path[0] == "" {
                path = path[1:]

Build and run:

$ go mod tidy
$ go build
$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 02:38:57 profile: cpu profiling enabled, /tmp/profile3691177944/cpu.pprof
2022/07/11 02:38:58 profile: cpu profiling disabled, /tmp/profile3691177944/cpu.pprof

Go can run a web server to visualize the pprof results but I find (after literally a few years of trying to figure it out) the CLI makes more sense to me.

$ go tool pprof /tmp/profile3691177944/cpu.pprof
File: jqgo
Type: cpu
Time: Jul 11, 2022 at 2:38am (UTC)
Duration: 401.63ms, Total samples = 270ms (67.23%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

Now we run top10 to see where we spend the bulk of time.

(pprof) top10
Showing nodes accounting for 260ms, 100% of 260ms total
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
      90ms 34.62% 34.62%      230ms 88.46%  main.(*jsonReader).eatValue
      60ms 23.08% 57.69%       70ms 26.92%  bufio.(*Reader).Peek
      50ms 19.23% 76.92%       60ms 23.08%  bufio.(*Reader).Discard
      20ms  7.69% 84.62%       20ms  7.69%  syscall.Syscall
      10ms  3.85% 88.46%       10ms  3.85%  bufio.(*Reader).Buffered (inline)
      10ms  3.85% 92.31%       10ms  3.85%  main.(*jsonReader).readByte
      10ms  3.85% 96.15%       10ms  3.85%  runtime.slicebytetostring
      10ms  3.85%   100%       10ms  3.85%  runtime.stkbucket
         0     0%   100%       10ms  3.85%  bufio.(*Reader).fill
         0     0%   100%       10ms  3.85%  encoding/json.(*Encoder).Encode

Now this is weird. Why are Peek and Discard so expensive? And why are we spending so much time in syscall.Syscall? The entire point of buffered I/O is to avoid hitting syscalls too frequently.

But since 88% of time is spent in eatValue, let's verify where in eatValue we are spending that time.

Within the pprof REPL we can enter list X where X is a regexp of a function name.

(pprof) list eatValue
Total: 260ms
ROUTINE ======================== main.(*jsonReader).eatValue in /home/phil/tmp/jqgo/mainprof.go
      90ms      230ms (flat, cum) 88.46% of Total
         .          .    159:   err := jr.eatWhitespace(r)
         .          .    160:   if err != nil {
         .          .    161:           return err
         .          .    162:   }
         .          .    163:
         .       20ms    164:   ok, _, err := jr.tryScalar(r)
         .          .    165:   if err != nil {
         .          .    166:           return err
         .          .    167:   }
         .          .    168:
         .          .    169:   // It was a scalar, we're done!
         .          .    170:   if ok {
         .          .    171:           return nil
         .          .    172:   }
         .          .    173:
         .          .    174:   // Otherwise it's an array or object
         .          .    175:   first := true
         .          .    176:
         .          .    177:   for first || len(stack) > 0 {
         .          .    178:           first = false
         .          .    179:
         .       60ms    180:           bs, err := r.Peek(1)
         .          .    181:           if err != nil {
         .          .    182:                   return err
         .          .    183:           }
      10ms       10ms    184:           b := bs[0]
         .          .    185:
      10ms       10ms    186:           if inString {
         .          .    187:                   if b == '"' && prev != '\\' {
         .          .    188:                           inString = false
         .          .    189:                   }
         .          .    190:
         .          .    191:                   // Two \\-es cancel eachother out
      10ms       10ms    192:                   if b == '\\' && prev == '\\' {
         .          .    193:                           prev = byte(0)
         .          .    194:                   } else {
         .          .    195:                           prev = b
         .          .    196:                   }
         .          .    197:
         .       60ms    198:                   r.Discard(1)
         .          .    199:                   continue
         .          .    200:           }
         .          .    201:
         .          .    202:           switch b {
         .          .    203:           case '[':
         .          .    204:                   stack = append(stack, b)
      10ms       10ms    205:           case ']':
         .          .    206:                   c := stack[len(stack)-1]
         .          .    207:                   stack = stack[:len(stack)-1]
         .          .    208:                   if c != '[' {
         .          .    209:                           return fmt.Errorf("Unexpected end of array: '%s'", string(c))
         .          .    210:                   }
         .          .    211:           case '{':
         .          .    212:                   stack = append(stack, b)
         .          .    213:           case '}':
      50ms       50ms    214:                   c := stack[len(stack)-1]
         .          .    215:                   stack = stack[:len(stack)-1]
         .          .    216:                   if c != '{' {
         .          .    217:                           return fmt.Errorf("Unexpected end of object: '%s'", string(c))
         .          .    218:                   }
         .          .    219:           case '"':

So by rank we can see we do spend the most time in Peek and Discard. Then in pulling the last item out of the stack??? That's weird. Let's ignore that.

Peek and Discard

Let's look at Peek in the pprof REPL:

(pprof) list Peek
Total: 260ms
ROUTINE ======================== bufio.(*Reader).Peek in /usr/local/go/src/bufio/bufio.go
      60ms       70ms (flat, cum) 26.92% of Total
         .          .    130:// also returns an error explaining why the read is short. The error is
         .          .    131:// ErrBufferFull if n is larger than b's buffer size.
         .          .    132://
         .          .    133:// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding
         .          .    134:// until the next read operation.
      10ms       10ms    135:func (b *Reader) Peek(n int) ([]byte, error) {
         .          .    136:   if n < 0 {
         .          .    137:           return nil, ErrNegativeCount
         .          .    138:   }
         .          .    139:
      10ms       10ms    140:   b.lastByte = -1
         .          .    141:   b.lastRuneSize = -1
         .          .    142:
      20ms       20ms    143:   for b.w-b.r < n && b.w-b.r < len(b.buf) && b.err == nil {
         .       10ms    144:           b.fill() // b.w-b.r < len(b.buf) => buffer is not full
         .          .    145:   }
         .          .    146:
         .          .    147:   if n > len(b.buf) {
         .          .    148:           return b.buf[b.r:b.w], ErrBufferFull
         .          .    149:   }
         .          .    150:
         .          .    151:   // 0 <= n <= len(b.buf)
         .          .    152:   var err error
         .          .    153:   if avail := b.w - b.r; avail < n {
         .          .    154:           // not enough data in buffer
         .          .    155:           n = avail
         .          .    156:           err = b.readErr()
         .          .    157:           if err == nil {
      10ms       10ms    158:                   err = ErrBufferFull
         .          .    159:           }
         .          .    160:   }
      10ms       10ms    161:   return b.buf[b.r : b.r+n], err
         .          .    162:}
         .          .    163:
         .          .    164:// Discard skips the next n bytes, returning the number of bytes discarded.
         .          .    165://
         .          .    166:// If Discard skips fewer than n bytes, it also returns an error.

The bulk of time here is spent in refilling the buffer (the fill method). So it seems like while bufio.Reader buffers reads it basically seems to not buffer peeks.

But hey, we were peeking and discarding one at a time anyway. Peeking and discarding were the same cost in eatValue. So let's ignore peeking for a second and think about discarding.

We could avoid doing so many discards if we just keep track of how much we are peeking at in the loop and only discard once at the end of the loop. (As an implementation detail, since there's a max internal buffer size we'll need to actually periodically discard when we try to peek and get a "buffer full" error.)

And based on that top10 result above, we need to do this in eatValue.

@@ -170,16 +170,31 @@
        }

        // Otherwise it's an array or object
+       length := 0
        first := true
-
+       var bs []byte
        for first || len(stack) > 0 {
+               length++
                first = false

-               bs, err := r.Peek(1)
-               if err != nil {
-                       return err
+               for {
+                       bs, err = r.Peek(length)
+                       if err == bufio.ErrBufferFull {
+                               _, err = r.Discard(length - 1)
+                               if err != nil {
+                                       return err
+                               }
+
+                               length = 1
+                               continue
+                       }
+                       if err != nil {
+                               return err
+                       }
+
+                       break
                }
-               b := bs[0]
+               b := bs[length-1]

                if inString {
                        if b == '"' && prev != '\\' {
@@ -193,7 +208,6 @@
                                prev = b
                        }

-                       r.Discard(1)
                        continue
                }

@@ -219,11 +233,11 @@
                        // Closing quote case handled elsewhere, above
                }

-               r.Discard(1)
                prev = b
        }

-       return nil
+       _, err = r.Discard(length)
+       return err
 }

 func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) {

Comment out the pkg/profile bits (profiling slows the whole thing down), rebuild, and rerun:

$ hyperfine --warmup 2 \
  "cat large-file.json | ./control/control '.repo.url' > control.test" \
  "cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
  "cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
  Time (mean ± σ):     302.0 ms ±   4.2 ms    [User: 287.7 ms, System: 49.7 ms]  Range (min  max):   296.6 ms  308.2 ms    10 runs

Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
  Time (mean ± σ):     215.0 ms ±   1.6 ms    [User: 189.1 ms, System: 46.9 ms]
  Range (min  max):   213.5 ms  218.7 ms    13 runs

Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
  Time (mean ± σ):     355.7 ms ±   1.4 ms    [User: 349.9 ms, System: 26.4 ms]
  Range (min  max):   354.3 ms  359.1 ms    10 runs

Summary
  'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
    1.40 ± 0.02 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
    1.65 ± 0.01 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'

Great! We've shaved off another 40ms. Let's enable profiling, re-run the program and go back into the pprof REPL.

$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 03:12:07 profile: cpu profiling enabled, /tmp/profile2229743747/cpu.pprof
2022/07/11 03:12:07 profile: cpu profiling disabled, /tmp/profile2229743747/cpu.pprof
$ go tool pprof /tmp/profile2229743747/cpu.pprof
File: jqgo
Type: cpu
Time: Jul 11, 2022 at 3:12am (UTC)
Duration: 401.33ms, Total samples = 210ms (52.33%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 210ms, 100% of 210ms total
Showing top 10 nodes out of 20
      flat  flat%   sum%        cum   cum%
     100ms 47.62% 47.62%      180ms 85.71%  main.(*jsonReader).eatValue
      70ms 33.33% 80.95%       70ms 33.33%  bufio.(*Reader).Peek
      10ms  4.76% 85.71%       10ms  4.76%  encoding/json.(*encodeState).string
      10ms  4.76% 90.48%       20ms  9.52%  main.(*jsonReader).expectString
      10ms  4.76% 95.24%       10ms  4.76%  main.(*jsonReader).readByte
      10ms  4.76%   100%       10ms  4.76%  reflect.Value.Type
         0     0%   100%       10ms  4.76%  encoding/json.(*Encoder).Encode
         0     0%   100%       10ms  4.76%  encoding/json.(*decodeState).literalStore
         0     0%   100%       10ms  4.76%  encoding/json.(*decodeState).unmarshal
         0     0%   100%       10ms  4.76%  encoding/json.(*decodeState).value

Nice, syscall.Syscall is no longer in the top 10. But eatValue is and we're still spending a bunch of time in Peek. We didn't try to stop calling Peek so much, we just cut down on calling Discard.

List eatValue.

(pprof) list eatValue
Total: 210ms
ROUTINE ======================== main.(*jsonReader).eatValue in /home/phil/tmp/jqgo/mainpeek.go
     100ms      180ms (flat, cum) 85.71% of Total
         .          .    159:   err := jr.eatWhitespace(r)
         .          .    160:   if err != nil {
         .          .    161:           return err
         .          .    162:   }
         .          .    163:
         .       20ms    164:   ok, _, err := jr.tryScalar(r)
         .          .    165:   if err != nil {
         .          .    166:           return err
         .          .    167:   }
         .          .    168:
         .          .    169:   // It was a scalar, we're done!
         .          .    170:   if ok {
         .          .    171:           return nil
         .          .    172:   }
         .          .    173:
         .          .    174:   // Otherwise it's an array or object
         .          .    175:   length := 0
         .          .    176:   first := true
         .          .    177:   var bs []byte
      20ms       20ms    178:   for first || len(stack) > 0 {
      10ms       10ms    179:           length++
         .          .    180:           first = false
         .          .    181:
         .          .    182:           for {
      20ms       80ms    183:                   bs, err = r.Peek(length)
         .          .    184:                   if err == bufio.ErrBufferFull {
         .          .    185:                           _, err = r.Discard(length - 1)
         .          .    186:                           if err != nil {
         .          .    187:                                   return err
         .          .    188:                           }
         .          .    189:
         .          .    190:                           length = 1
         .          .    191:                           continue
         .          .    192:                   }
         .          .    193:                   if err != nil {
         .          .    194:                           return err
         .          .    195:                   }
         .          .    196:
         .          .    197:                   break
         .          .    198:           }
      10ms       10ms    199:           b := bs[length-1]
         .          .    200:
      10ms       10ms    201:           if inString {
      10ms       10ms    202:                   if b == '"' && prev != '\\' {
                         203:                           inString = false
         .          .    204:                   }
         .          .    205:
         .          .    206:                   // Two \\-es cancel eachother out
      20ms       20ms    207:                   if b == '\\' && prev == '\\' {
         .          .    208:                           prev = byte(0)
         .          .    209:                   } else {
         .          .    210:                           prev = b
         .          .    211:                   }
         .

The bulk of time is spent in Peek. Let's list Peek again.

(pprof) list Peek
Total: 210ms
ROUTINE ======================== bufio.(*Reader).Peek in /usr/local/go/src/bufio/bufio.go
      70ms       70ms (flat, cum) 33.33% of Total
         .          .    130:// also returns an error explaining why the read is short. The error is
         .          .    131:// ErrBufferFull if n is larger than b's buffer size.
         .          .    132://
         .          .    133:// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding
         .          .    134:// until the next read operation.
      10ms       10ms    135:func (b *Reader) Peek(n int) ([]byte, error) {
         .          .    136:   if n < 0 {
         .          .    137:           return nil, ErrNegativeCount
         .          .    138:   }
         .          .    139:
         .          .    140:   b.lastByte = -1
         .          .    141:   b.lastRuneSize = -1
         .          .    142:
      10ms       10ms    143:   for b.w-b.r < n && b.w-b.r < len(b.buf) && b.err == nil {
         .          .    144:           b.fill() // b.w-b.r < len(b.buf) => buffer is not full
         .          .    145:   }
         .          .    146:
         .          .    147:   if n > len(b.buf) {
         .          .    148:           return b.buf[b.r:b.w], ErrBufferFull
         .          .    149:   }
         .          .    150:
         .          .    151:   // 0 <= n <= len(b.buf)
         .          .    152:   var err error
         .          .    153:   if avail := b.w - b.r; avail < n {
         .          .    154:           // not enough data in buffer
         .          .    155:           n = avail
         .          .    156:           err = b.readErr()
         .          .    157:           if err == nil {
         .          .    158:                   err = ErrBufferFull
         .          .    159:           }
         .          .    160:   }
      50ms       50ms    161:   return b.buf[b.r : b.r+n], err
         .          .    162:}
         .          .    163:
         .          .    164:// Discard skips the next n bytes, returning the number of bytes discarded.
         .          .    165://
         .          .    166:// If Discard skips fewer than n bytes, it also returns an error.

Well it's not really clear to me from this why we spend so much time slicing here.

We might be able to use Peek much less if we kept our own FIFO queue of peeked-at bytes. But I don't feel like writing a correct, efficient FIFO queue (a ring buffer, basically) and maybe there are other aspects of this program we can look at. So let's give this train of thought a break.

Memory profiling

Let's change tactics entirely. Memory allocation tends to be expensive. Allocating in a loop is generally a bad idea. And this entire program is a loop. So let's try doing a memory profile instead of a CPU profile.

Instead of defer profile.Start().Stop() we'll set defer profile.Start(profile.MemProfile).Stop().

Build, rerun and enter pprof with the -alloc_space flag. We want to see where memory is being allocated.

$ go build
$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 03:24:55 profile: memory profiling enabled (rate 4096), /tmp/profile1407859643/mem.pprof
2022/07/11 03:24:56 profile: memory profiling disabled, /tmp/profile1407859643/mem.pprof
$ go tool pprof -alloc_objects /tmp/profile1407859643/mem.pprof
File: jqgo
Type: alloc_objects
Time: Jul 11, 2022 at 3:24am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 365899, 99.95% of 366086 total
Dropped 24 nodes (cum <= 1830)
Showing top 10 nodes out of 14
      flat  flat%   sum%        cum   cum%
    227585 62.17% 62.17%     262708 71.76%  main.(*jsonReader).expectString
     40945 11.18% 73.35%      40945 11.18%  main.(*jsonReader).readByte
     39500 10.79% 84.14%     252585 69.00%  main.(*jsonReader).tryScalar
     30009  8.20% 92.34%      41924 11.45%  main.(*jsonReader).tryNumber
     12055  3.29% 95.63%     215416 58.84%  main.(*jsonReader).eatValue
      7555  2.06% 97.70%      11915  3.25%  encoding/json.Unmarshal
      4360  1.19% 98.89%       4360  1.19%  encoding/json.(*decodeState).literalStore
      3847  1.05% 99.94%       3847  1.05%  main.(*jsonReader).expectIdentifier
        43 0.012% 99.95%     365931   100%  runtime.main
         0     0% 99.95%       4360  1.19%  encoding/json.(*decodeState).unmarshal

And just like in the CPU profile we can list functions to see where the allocations happen in code. Let's list the biggest memory user here, expectString.

(pprof) list expectString
Total: 366086
ROUTINE ======================== main.(*jsonReader).expectString in /home/phil/tmp/jqgo/mainpeek.go
    227585     262708 (flat, cum) 71.76% of Total
         .          .     58:   err := jr.eatWhitespace(r)
         .          .     59:   if err != nil {
         .          .     60:           return "", err
         .          .     61:   }
         .          .     62:
         .       4941     63:   b, err := jr.readByte(r)
         .          .     64:   if err != nil {
         .          .     65:           return "", err
         .          .     66:   }
         .          .     67:
         .          .     68:   if b != '"' {
         .          .     69:           return "", fmt.Errorf("Expected double quote to start string, got: '%s'", string(b))
         .          .     70:   }
         .          .     71:
         .          .     72:   var prev byte
         .          .     73:   for {
         .      30182     74:           b, err := jr.readByte(r)
         .          .     75:           if err != nil {
         .          .     76:                   return "", err
         .          .     77:           }
         .          .     78:
         .          .     79:           if b == '\\' && prev == '\\' {
         .          .     80:                   // Just skip
         .          .     81:                   prev = byte(0)
         .          .     82:                   continue
         .          .     83:           } else if b == '"' {
         .          .     84:                   // Overwrite the escaped double quote
         .          .     85:                   if prev == '\\' {
         .          .     86:                           s[len(s)-1] = '"'
         .          .     87:                   } else {
         .          .     88:                           // Otherwise it's the actual end
         .          .     89:                           break
         .          .     90:                   }
         .          .     91:           }
         .          .     92:
    146302     146302     93:           s = append(s, b)
         .          .     94:           prev = b
         .          .     95:   }
         .          .     96:
     81283      81283     97:   return string(s), nil
         .          .     98:}
         .          .     99:
         .          .    100:func (jr *jsonReader) expectIdentifier(r *bufio.Reader, ident string, value any) (any, error) {
         .          .    101:   var s []byte
         .          .    102:

And the biggest offender is growing the string! The good thing is that growing this string can be amortized because we can share the underlying string memory across calls on the jsonResponse struct. This way, expectString only needs to grow the string when it actually sees a bigger string than we've already seen.

The builtin bytes.Buffer type does exactly this. We can put a bytes.Buffer on the jsonResponse struct because this code isn't multithreaded and because expectString doesn't call itself.

@@ -2,6 +2,7 @@

 import (
        "bufio"
+       "bytes"
        "encoding/json"
        "fmt"
        "io"
@@ -13,6 +14,8 @@

 type jsonReader struct {
        read []byte
+
+       expectString_buffer bytes.Buffer
 }

 func (jr *jsonReader) reset() {
@@ -51,7 +54,7 @@
 }

 func (jr *jsonReader) expectString(r *bufio.Reader) (string, error) {
-       var s []byte
+       jr.expectString_buffer.Reset()

        err := jr.eatWhitespace(r)
        if err != nil {
@@ -81,18 +84,18 @@
                } else if b == '"' {
                        // Overwrite the escaped double quote
                        if prev == '\\' {
-                               s[len(s)-1] = '"'
+                               jr.expectString_buffer.Bytes()[jr.expectString_buffer.Len()-1] = '"'
                        } else {
                                // Otherwise it's the actual end
                                break
                        }
                }

-               s = append(s, b)
+               jr.expectString_buffer.WriteByte(b)
                prev = b
        }

-       return string(s), nil
+       return jr.expectString_buffer.String(), nil
 }

Or instead of sharing memory on the struct, maybe this would be a good place to use sync.Pool?

Disable pkg/profile, build and rerun with hyperfine.

$ hyperfine --warmup 2 \
  "cat large-file.json | ./control/control '.repo.url' > control.test" \
  "cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
  "cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
  Time (mean ± σ):     307.2 ms ±  10.8 ms    [User: 292.8 ms, System: 49.4 ms]
  Range (min  max):   296.5 ms  326.2 ms    10 runs

Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
  Time (mean ± σ):     210.8 ms ±   2.2 ms    [User: 185.4 ms, System: 44.9 ms]
  Range (min  max):   209.1 ms  216.8 ms    14 runs

Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
  Time (mean ± σ):     356.1 ms ±   2.6 ms    [User: 349.1 ms, System: 26.9 ms]
  Range (min  max):   354.1 ms  362.9 ms    10 runs

Summary
  'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
    1.46 ± 0.05 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
    1.69 ± 0.02 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'

And we've shaved another 20ms off. That's not bad!

Coming to a close

There is more we could do but this is a long post already.

For example, in the project repo I also built a generic vector type with a pop operation that is used for the stack in the eatValue function. It is shared on the jsonReader instance like the expectString buffer. This ended up shaving another 20ms. And I also got rid of most conversions from []byte to string (which is an expensive allocation you may notice listed as bytes.String() in the top10 of -alloc_objects if you run the profiler again now.)

But hopefully you're getting the gist of how you might investigate CPU and memory usage. For me it's still a lot of poking around and trying different things. But after a few years of trying to get better at profiling Go programs I think I'm starting to get the hang of it.