In this post we'll build a basic jq clone in Go. It will only be able to pull a single path out of each object it reads. It won't be able to do filters, mapping, etc.
$ cat large-file.json | head -n2 | ./jqgo '.repo.url'
"https://api.github.com/repos/petroav/6.828"
"https://api.github.com/repos/rspt/rspt-theme"
We'll start by building a "control" implementation that uses Go's builtin JSON library with a JSON path tool on top.
Then we'll implement a basic path-aware JSON parser in 600 lines of Go. It's going to use a technique (that may have a better name but) I call "partial parsing" or "fuzzy parsing" where we fully parse what we care about and only sort of parse the rest.
Why partial parsing? There are two general reasons. One is to use less memory than parsers that must always turn all of a text into an object in your language. The other is for when the language has complexities you don't want or need to deal with. We'll basically have to deal with all the complexities of JSON so this post is about the former reason: using less memory. I've written about a case for the second reason though in building a simple, fast SCSS implementation.
This partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at previous articles I've written about parsing JSON.
Once we get this partial parser working we'll turn to Go's builtin profiler to find what we can do to make it faster.
All code for this post is available on Github.
Machine specs, versions
Since we're going to be doing some rudimentary comparisons of performance, here are my details. I am running everything on a dedicated server, OVH Rise-1.
- RAM: 64 GB DDR4 ECC 2,133 MHz
- Disk: 2x450 GB SSD NVMe in Soft RAID
- Processor: Intel Xeon E3-1230v6 - 4c/8t - 3.5 GHz/3.9 GHz
And relevant versions:
$ jq --version
jq-1.6
$ go version
go version go1.18 linux/amd64
$ uname -a
Linux phil 5.18.10-100.fc35.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 7 17:41:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Now buckle up!
jq using Go's builtin JSON library
This is a very simple program. We just parse JSON data from stdin in a
loop. And after parsing each time we'll call a extractValueAtPath
function to grab the value at the path the user asks for.
To keep our path "parser" very simple we'll treat array access the
same as object access. So we'll look for x.0
instead of x[0]
,
unlike jq.
package main
import (
"encoding/json"
"io"
"log"
"os"
"strconv"
"strings"
)
func extractValueAtPath(a map[string]any, path []string) (any, error) {
// TODO
}
func main() {
path := strings.Split(os.Args[1], ".")
if path[0] == "" {
path = path[1:]
}
dec := json.NewDecoder(os.Stdin)
var a map[string]any
enc := json.NewEncoder(os.Stdout)
for {
err := dec.Decode(&a)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
v, err := extractValueAtPath(a, path)
if err != nil {
log.Fatal(err)
}
err = enc.Encode(v)
if err != nil {
log.Fatal(err)
}
}
}
Then we implement the extractValueAtPath
function itself,
entering into JSON arrays and objects until we reach the end of the
path.
func extractValueAtPath(a map[string]any, path []string) (any, error) {
if len(path) == 0 {
return nil, nil
}
var v any = a
for _, part := range path {
if arr, ok := v.([]any); ok {
n, err := strconv.Atoi(part)
if err != nil {
return nil, err
}
v = arr[n]
continue
}
m, ok := v.(map[string]any)
if !ok {
// Path into a non-map
return nil, nil
}
v, ok = m[part]
if !ok {
// Path does not exist
return nil, nil
}
}
return v, nil
}
Alright, let's give it a go module and build and run it!
$ go mod init control
$ go mod tidy
$ go build
# Grab a test file
$ curl https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json | jq -c '.[]' > large-file.json
$ cat large-file.json | head -n2 | ./control '.repo.url'
"https://api.github.com/repos/petroav/6.828"
"https://api.github.com/repos/rspt/rspt-theme"
Sweet. Now let's make sure it produces the same thing as jq.
$ cat large-file.json | ./control '.repo.url' > control.test
$ cat large-file.json | jq '.repo.url' > jq.test
$ diff jq.test control.test
$ echo $?
0
Great! It's working for a basic query. Let's see how it performs.
$ hyperfine \
"cat large-file.json | ./control '.repo.url' > control.test" \
"cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control '.repo.url' > control.test
Time (mean ± σ): 310.0 ms ± 14.4 ms [User: 296.2 ms, System: 49.3 ms]
Range (min … max): 296.1 ms … 344.9 ms 10 runs
Benchmark 2: cat large-file.json | jq '.repo.url' > jq.test
Time (mean ± σ): 355.8 ms ± 1.1 ms [User: 348.8 ms, System: 27.7 ms]
Range (min … max): 354.8 ms … 358.5 ms 10 runs
Summary
'cat large-file.json | ./control '.repo.url' > control.test' ran
1.15 ± 0.05 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'
Now that's surprising! This naive implementation in Go is a bit faster than standard jq. But our implementation supports a heck of a lot less than jq. So this benchmark on its own isn't incredibly meaningful.
However, it's a good base for comparing to our next implementation.
Astute readers may notice that this version doesn't use a buffered reader from stdin, while the next version will. I tried this version with and without wrapping stdin in a buffered reader but it didn't make a meaningful difference. It might be because Go's JSON decoder does its own buffering. I'm not sure.
Let's do the fun implementation.
Partial parsing
Unlike a typical handwritten parser this partial parser is going to contain almost two parsers. One parser will care exactly about the structure of JSON. The other parser will only care about reading past the current value (whether it be a number or string or array or object, etc.) The path we pass to the parser will be used to decide whether each value should be fully parsed or partially parsed.
I'll reiterate: this partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at previous articles I've written about parsing JSON.
The shell of this partial parser is going to look similar to the shell of the first parser.
package main
import (
"bufio"
"encoding/json"
"fmt"
"io"
"log"
"os"
"strconv"
"strings"
)
type jsonReader struct {
read []byte
}
... TO IMPLEMENT ...
func main() {
path := strings.Split(os.Args[1], ".")
if path[0] == "" {
path = path[1:]
}
b := bufio.NewReader(os.Stdin)
enc := json.NewEncoder(os.Stdout)
var jr jsonReader
var val any
var err error
for {
jr.reset()
val, err = jr.extractDataFromJsonPath(b, path)
if err == io.EOF {
break
}
if err != nil {
log.Println("Read", string(jr.read))
log.Fatalln(err)
}
err = enc.Encode(val)
if err != nil {
log.Fatalln(err)
}
}
}
Except instead of using the builtin JSON parser we'll call our own
extractDataFromJsonPath
function that handles parsing and extraction
all at once.
Before doing that we'll add a few helper functions. The first one grabs a byte from a reader and stores the read byte locally (so we can print out all read bytes if the program fails).
type jsonReader struct {
read []byte
}
func (jr *jsonReader) readByte(r *bufio.Reader) (byte, error) {
c, err := r.ReadByte()
if err != nil {
return byte(0), err
}
jr.read = append(jr.read, c)
return c, nil
}
The reset
member zeroes out the read
bytes and gets called before
each object is parsed in the main
main loop.
func (jr *jsonReader) reset() {
jr.read = nil
}
Now let's get into extractDataFromJsonPath
.
extractDataFromJsonPath
This is the real parser. It expects a JSON object and fully parses the object, almost.
func (jr *jsonReader) extractDataFromJsonPath(r *bufio.Reader, path []string) (any, error) {
if len(path) == 0 {
return nil, nil
}
err := jr.eatWhitespace(r)
if err != nil {
return nil, err
}
b, err := jr.readByte(r)
if err != nil {
return nil, err
}
// Make sure we're actually going into an object
if b != '{' {
return nil, fmt.Errorf("Expected opening curly brace, got: '%s'", string(b))
}
i := -1
var result any
for {
i++
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
bs, err := r.Peek(1)
if err != nil {
return nil, err
}
b := bs[0]
// We found the end of the object
if b == '}' {
r.Discard(1)
break
}
// Key-value pairs must be separated by commas
if i > 0 {
if b == ',' {
r.Discard(1)
} else {
return nil, fmt.Errorf("Expected comma between key-value pairs, got: '%s'", string(b))
}
}
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
// Grab the key
s, err := jr.expectString(r)
if err != nil {
return nil, err
}
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
// Find a colon separating key from value
b, err = jr.readByte(r)
if err != nil {
return nil, err
}
if b != ':' {
return nil, fmt.Errorf("Expected colon, got: '%s'", string(b))
}
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
Up to this point it looks like any old handwritten parser. There are a
few helpers in there (eatWhitespace
, expectString
) we'll implement
shortly.
But once we see each key and are ready to look for a value we can decide if we need to fully parse the value (if the path goes into this key) or if we can partially parse the value (because the path does not go into this key).
// If the key is not the start of this path, skip past this value
if path[0] != s {
err = jr.eatValue(r)
if err != nil {
return nil, err
}
continue
}
// Otherwise this is a path we want, grab the value
result, err = jr.expectValue(r, path[1:])
if err != nil {
return nil, err
}
}
return result, nil
}
And that's it! The core parsing loop is done. The meat now becomes 1)
the eatValue
function that partially parses JSON and 2) the
expectValue
function that either encounters a scalar value and
returns it or recursively calls extractDataFromJsonPath
to enter some new object.
Notes on helper naming
There are three main kinds of helpers you'll see. expectX
helpers
like expectString
will return early with an error if they fail to
find what they're looking for. eatX
helpers like eatWhitespace
will not return any value and will only move the read cursor
forward. And tryX
helpers like tryNumber
will do the same thing as
expectString
but return an additional boolean argument. So the
caller can decide whether or not to make other attempts at parsing.
But first let's fill in the two helpers we skipped. First off, eatWhitespace
.
eatWhitespace
This function peeks and reads bytes while the bytes are whitespace.
func (jr *jsonReader) eatWhitespace(r *bufio.Reader) error {
for {
bs, err := r.Peek(1)
if err != nil {
return err
}
b := bs[0]
isWhitespace := b == ' ' ||
b == '\n' ||
b == '\t' ||
b == '\r'
if !isWhitespace {
return nil
}
r.Discard(1)
}
}
That's it! Next we need to fill in expectString
.
expectString
This is a standard handwritten parser helper that looks for a double quote and keeps collecting bytes until it finds an ending double quote that is not escaped.
func (jr *jsonReader) expectString(r *bufio.Reader) (string, error) {
var s []byte
err := jr.eatWhitespace(r)
if err != nil {
return "", err
}
// Look for opening quote
b, err := jr.readByte(r)
if err != nil {
return "", err
}
if b != '"' {
return "", fmt.Errorf("Expected double quote to start string, got: '%s'", string(b))
}
var prev byte
for {
b, err := jr.readByte(r)
if err != nil {
return "", err
}
if b == '\\' && prev == '\\' {
// Just skip
prev = byte(0)
continue
} else if b == '"' {
// Overwrite the escaped double quote
if prev == '\\' {
s[len(s)-1] = '"'
} else {
// Otherwise it's the actual end
break
}
}
s = append(s, b)
prev = b
}
return string(s), nil
}
Standard stuff! Now let's get back to those meaty functions we
introduced before, starting with expectValue
.
expectValue
This function is called by extractDataFromJsonPath
when it wants to
fully parse a value.
If we see a left curly brace, we call extractDataFromJsonPath
with
it.
func (jr *jsonReader) expectValue(r *bufio.Reader, path []string) (any, error) {
bs, err := r.Peek(1)
if err != nil {
return nil, err
}
c := bs[0]
if c == '{' {
return jr.extractDataFromJsonPath(r, path)
Otherwise if we see a left bracket we call a new helper
extractArrayDataFromJsonPath
which will be almost identical to
extractDataFromJsonPath
but for parsing array syntax.
} else if c == '[' {
return jr.extractArrayDataFromJsonPath(r, path)
}
If the value we're trying to parse isn't an array or object and there's more of a path then we have to return null because we can't enter into a scalar value.
// Can't go any further into a path
if len(path) != 0 {
// Reached the end of this object but more of
// the path remains. So this object doesn't
// contain this path.
return nil, nil
}
Then we try to parse a scalar (numbers, strings, booleans, null
) and
ultimately return an error if nothing worked.
ok, val, err := jr.tryScalar(r)
if err != nil {
return nil, err
}
if !ok {
return nil, fmt.Errorf("Expected scalar, got: '%s'", string(c))
}
return val, err
}
Let's implement tryScalar
and its dependencies now. And we'll come
back to extractArrayDataFromJsonPath
afterward.
tryScalar
The tryScalar
is similar to expectValue
. It's called tryScalar
because it's allowed to fail.
We peek at the first byte and switch on a dedicated parsing helper based on it.
func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) {
bs, err := r.Peek(1)
if err != nil {
return false, nil, err
}
c := bs[0]
if c == '"' {
val, err := jr.expectString(r)
return true, string(val), err
} else if c == 't' {
val, err := jr.expectIdentifier(r, "true", true)
return true, val, err
} else if c == 'f' {
val, err := jr.expectIdentifier(r, "false", false)
return true, val, err
} else if c == 'n' {
val, err := jr.expectIdentifier(r, "null", nil)
return true, val, err
}
return jr.tryNumber(r)
}
This passes control flow to two new functions, expectIdentifier
and
tryNumber
. Let's do expectIdentifier
next.
expectIdentifier
This function tries to match the reader on a string passed to it.
func (jr *jsonReader) expectIdentifier(r *bufio.Reader, ident string, value any) (any, error) {
var s []byte
for i := 0; i < len(ident); i++ {
b, err := r.ReadByte()
if err != nil {
return nil, err
}
s = append(s, b)
}
if string(s) == ident {
return value, nil
}
return nil, fmt.Errorf("Unknown value: '%s'", string(s))
}
Thanks Michael
Lynch for pointing out in an earlier version that
expectIdentifier
does not need to Peek
/Discard
but can just
ReadByte
instead.
tryNumber
This function tries to parse a number. We'll do a very lazy number
parser that will most likely allow all valid numbers. Internally
we'll call json.Unmarshal
on the bytes we build up to do the
conversion itself.
func (jr *jsonReader) tryNumber(r *bufio.Reader) (bool, any, error) {
var number []byte
// Loop trying to find all number-like characters in a row
for {
bs, err := r.Peek(1)
if err != nil {
return false, nil, err
}
c := bs[0]
isNumberCharacter := (c >= '0' && c <= '9') || c == 'e' || c == '-'
if !isNumberCharacter {
break
}
number = append(number, c)
r.Discard(1)
}
if len(number) == 0 {
return false, nil, nil
}
var n float64
err := json.Unmarshal(number, &n)
return true, n, err
}
If we can't find a number, that's ok. We'll just say so in the first
argument by returning false
.
Outstanding functions
Ok we've come a while building out helper functions. The last two
remaining helpers are extractArrayDataFromJsonPath
and
eatValue
. Let's finish up these real parser functions before getting
to eatValue
, the primary partial parsing function.
extractArrayDataFromJsonPath
This function is almost identical to extractDataFromJsonPath
but
rather than parsing key-value pairs inside curly braces it parses
values inside brackets.
func (jr *jsonReader) extractArrayDataFromJsonPath(r *bufio.Reader, path []string) (any, error) {
// Path inside an array must be an integer
n, err := strconv.Atoi(string(path[0]))
if err != nil {
return nil, err
}
// Look for opening bracket. Make sure we're in an array
b, err := jr.readByte(r)
if err != nil {
return nil, err
}
if b != '[' {
return nil, fmt.Errorf("Expected opening bracket, got: '%s'", string(b))
}
var result any
i := -1
for {
i++
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
bs, err := r.Peek(1)
if err != nil {
return nil, err
}
b := bs[0]
// Found closing bracket, exit the array
if b == ']' {
r.Discard(1)
break
}
// Array values must be separated by a comma
if i > 0 {
if b == ',' {
r.Discard(1)
} else {
return nil, fmt.Errorf("Expected comma between key-value pairs, got: '%s'", string(b))
}
}
err = jr.eatWhitespace(r)
if err != nil {
return nil, err
}
Just like extractDataFromJsonPath
it either calls eatValue
or
expectValue
depending on whether the current index matches the
requested path.
// If the key is not the start of this path, skip past this value
if i != n {
err = jr.eatValue(r)
if err != nil {
return nil, err
}
continue
}
result, err = jr.expectValue(r, path[1:])
if err != nil {
return nil, err
}
}
return result, nil
}
That's it for full parser functions! Let's do the partial parser,
eatValue
.
eatValue
This function is simpler than the full parser functions we wrote before.
First off it looks for the simple case where the value is a scalar.
func (jr *jsonReader) eatValue(r *bufio.Reader) error {
var stack []byte
inString := false
var prev byte
err := jr.eatWhitespace(r)
if err != nil {
return err
}
ok, _, err := jr.tryScalar(r)
if err != nil {
return err
}
// It was a scalar, we're done!
if ok {
return nil
}
All it does is read until the value ends.
If the value is not a scalar though we need to read past complete JSON arrays and/or objects.
To do this we'll simply read through bytes, monitoring a stack of open and close braces and brackets. If we enter a string we'll skip all bytes inside the string until the string ends.
// Otherwise it's an array or object
first := true
for first || len(stack) > 0 {
first = false
bs, err := r.Peek(1)
if err != nil {
return err
}
b := bs[0]
if inString {
if b == '"' && prev != '\\' {
inString = false
}
// Two \\-es cancel eachother out
if b == '\\' && prev == '\\' {
prev = byte(0)
} else {
prev = b
}
r.Discard(1)
continue
}
switch b {
case '[':
stack = append(stack, b)
case ']':
c := stack[len(stack)-1]
stack = stack[:len(stack)-1]
if c != '[' {
return fmt.Errorf("Unexpected end of array: '%s'", string(c))
}
case '{':
stack = append(stack, b)
case '}':
c := stack[len(stack)-1]
stack = stack[:len(stack)-1]
if c != '{' {
return fmt.Errorf("Unexpected end of object: '%s'", string(c))
}
case '"':
inString = true
// Closing quote case handled elsewhere, above
}
r.Discard(1)
prev = b
}
return nil
}
And we're finally done the first pass of the path-aware jq implementation.
Build, test, benchmark
Let's give it a go module, build and test it.
$ go mod init jqgo
$ go build
$ curl https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json | jq -c '.[]' > large-file.json
$ cat large-file.json | ./jqgo '.repo.url' > jqgo.test
$ cat large-file.json | jq '.repo.url' > jq.test
$ diff jq.test jqgo.test
$ echo $?
0
Great! :) Let's benchmark it against jq and the control implementation.
$ hyperfine --warmup 2 \
"cat large-file.json | ./control/control '.repo.url' > control.test" \
"cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
"cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
Time (mean ± σ): 302.0 ms ± 3.4 ms [User: 283.7 ms, System: 53.1 ms]
Range (min … max): 297.4 ms … 309.0 ms 10 runs
Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
Time (mean ± σ): 258.8 ms ± 2.2 ms [User: 230.3 ms, System: 47.6 ms]
Range (min … max): 256.3 ms … 262.6 ms 11 runs
Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
Time (mean ± σ): 357.6 ms ± 2.9 ms [User: 350.0 ms, System: 28.3 ms]
Range (min … max): 355.0 ms … 362.9 ms 10 runs
Summary
'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
1.17 ± 0.02 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
1.38 ± 0.02 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'
Now to my surprise we're already beating the non-path-aware control implementation! When I first wrote the path-aware version, it was slower than the control. So I had to start performance profiling. For this blog post I tried to remake the slowest variation I could remember but I couldn't get it slower than this.
That said, the best version was faster than this so I can demonstrate the process of profiling to improve performance.
Let's dig in. :)
Profiling in Go
There are various ways to enable profiling in Go. One way some people
recommend is through the builtin benchmark
support
in go test
. I don't really like this method though. I prefer to use
pkg/profile manually in main.go
.
@@ -9,6 +9,8 @@
"os"
"strconv"
"strings"
+
+ "github.com/pkg/profile"
)
type jsonReader struct {
@@ -450,6 +452,7 @@
}
func main() {
+ defer profile.Start().Stop()
path := strings.Split(os.Args[1], ".")
if path[0] == "" {
path = path[1:]
Build and run:
$ go mod tidy
$ go build
$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 02:38:57 profile: cpu profiling enabled, /tmp/profile3691177944/cpu.pprof
2022/07/11 02:38:58 profile: cpu profiling disabled, /tmp/profile3691177944/cpu.pprof
Go can run a web server to visualize the pprof results but I find (after literally a few years of trying to figure it out) the CLI makes more sense to me.
$ go tool pprof /tmp/profile3691177944/cpu.pprof
File: jqgo
Type: cpu
Time: Jul 11, 2022 at 2:38am (UTC)
Duration: 401.63ms, Total samples = 270ms (67.23%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
Now we run top10
to see where we spend the bulk of time.
(pprof) top10
Showing nodes accounting for 260ms, 100% of 260ms total
Showing top 10 nodes out of 31
flat flat% sum% cum cum%
90ms 34.62% 34.62% 230ms 88.46% main.(*jsonReader).eatValue
60ms 23.08% 57.69% 70ms 26.92% bufio.(*Reader).Peek
50ms 19.23% 76.92% 60ms 23.08% bufio.(*Reader).Discard
20ms 7.69% 84.62% 20ms 7.69% syscall.Syscall
10ms 3.85% 88.46% 10ms 3.85% bufio.(*Reader).Buffered (inline)
10ms 3.85% 92.31% 10ms 3.85% main.(*jsonReader).readByte
10ms 3.85% 96.15% 10ms 3.85% runtime.slicebytetostring
10ms 3.85% 100% 10ms 3.85% runtime.stkbucket
0 0% 100% 10ms 3.85% bufio.(*Reader).fill
0 0% 100% 10ms 3.85% encoding/json.(*Encoder).Encode
Now this is weird. Why are Peek
and Discard
so expensive? And why
are we spending so much time in syscall.Syscall
? The entire point of
buffered I/O is to avoid hitting syscalls too frequently.
But since 88% of time is spent in eatValue
, let's verify where in
eatValue
we are spending that time.
Within the pprof
REPL we can enter list X
where X
is a regexp of
a function name.
(pprof) list eatValue
Total: 260ms
ROUTINE ======================== main.(*jsonReader).eatValue in /home/phil/tmp/jqgo/mainprof.go
90ms 230ms (flat, cum) 88.46% of Total
. . 159: err := jr.eatWhitespace(r)
. . 160: if err != nil {
. . 161: return err
. . 162: }
. . 163:
. 20ms 164: ok, _, err := jr.tryScalar(r)
. . 165: if err != nil {
. . 166: return err
. . 167: }
. . 168:
. . 169: // It was a scalar, we're done!
. . 170: if ok {
. . 171: return nil
. . 172: }
. . 173:
. . 174: // Otherwise it's an array or object
. . 175: first := true
. . 176:
. . 177: for first || len(stack) > 0 {
. . 178: first = false
. . 179:
. 60ms 180: bs, err := r.Peek(1)
. . 181: if err != nil {
. . 182: return err
. . 183: }
10ms 10ms 184: b := bs[0]
. . 185:
10ms 10ms 186: if inString {
. . 187: if b == '"' && prev != '\\' {
. . 188: inString = false
. . 189: }
. . 190:
. . 191: // Two \\-es cancel eachother out
10ms 10ms 192: if b == '\\' && prev == '\\' {
. . 193: prev = byte(0)
. . 194: } else {
. . 195: prev = b
. . 196: }
. . 197:
. 60ms 198: r.Discard(1)
. . 199: continue
. . 200: }
. . 201:
. . 202: switch b {
. . 203: case '[':
. . 204: stack = append(stack, b)
10ms 10ms 205: case ']':
. . 206: c := stack[len(stack)-1]
. . 207: stack = stack[:len(stack)-1]
. . 208: if c != '[' {
. . 209: return fmt.Errorf("Unexpected end of array: '%s'", string(c))
. . 210: }
. . 211: case '{':
. . 212: stack = append(stack, b)
. . 213: case '}':
50ms 50ms 214: c := stack[len(stack)-1]
. . 215: stack = stack[:len(stack)-1]
. . 216: if c != '{' {
. . 217: return fmt.Errorf("Unexpected end of object: '%s'", string(c))
. . 218: }
. . 219: case '"':
So by rank we can see we do spend the most time in Peek
and
Discard
. Then in pulling the last item out of the stack??? That's
weird. Let's ignore that.
Peek and Discard
Let's look at Peek
in the pprof REPL:
(pprof) list Peek
Total: 260ms
ROUTINE ======================== bufio.(*Reader).Peek in /usr/local/go/src/bufio/bufio.go
60ms 70ms (flat, cum) 26.92% of Total
. . 130:// also returns an error explaining why the read is short. The error is
. . 131:// ErrBufferFull if n is larger than b's buffer size.
. . 132://
. . 133:// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding
. . 134:// until the next read operation.
10ms 10ms 135:func (b *Reader) Peek(n int) ([]byte, error) {
. . 136: if n < 0 {
. . 137: return nil, ErrNegativeCount
. . 138: }
. . 139:
10ms 10ms 140: b.lastByte = -1
. . 141: b.lastRuneSize = -1
. . 142:
20ms 20ms 143: for b.w-b.r < n && b.w-b.r < len(b.buf) && b.err == nil {
. 10ms 144: b.fill() // b.w-b.r < len(b.buf) => buffer is not full
. . 145: }
. . 146:
. . 147: if n > len(b.buf) {
. . 148: return b.buf[b.r:b.w], ErrBufferFull
. . 149: }
. . 150:
. . 151: // 0 <= n <= len(b.buf)
. . 152: var err error
. . 153: if avail := b.w - b.r; avail < n {
. . 154: // not enough data in buffer
. . 155: n = avail
. . 156: err = b.readErr()
. . 157: if err == nil {
10ms 10ms 158: err = ErrBufferFull
. . 159: }
. . 160: }
10ms 10ms 161: return b.buf[b.r : b.r+n], err
. . 162:}
. . 163:
. . 164:// Discard skips the next n bytes, returning the number of bytes discarded.
. . 165://
. . 166:// If Discard skips fewer than n bytes, it also returns an error.
The bulk of time here is spent in refilling the buffer (the fill
method). So it seems like while bufio.Reader
buffers reads it
basically seems to not buffer peeks.
But hey, we were peeking and discarding one at a time anyway. Peeking
and discarding were the same cost in eatValue
. So let's ignore
peeking for a second and think about discarding.
We could avoid doing so many discards if we just keep track of how much we are peeking at in the loop and only discard once at the end of the loop. (As an implementation detail, since there's a max internal buffer size we'll need to actually periodically discard when we try to peek and get a "buffer full" error.)
And based on that top10
result above, we need to do this in
eatValue
.
@@ -170,16 +170,31 @@
}
// Otherwise it's an array or object
+ length := 0
first := true
-
+ var bs []byte
for first || len(stack) > 0 {
+ length++
first = false
- bs, err := r.Peek(1)
- if err != nil {
- return err
+ for {
+ bs, err = r.Peek(length)
+ if err == bufio.ErrBufferFull {
+ _, err = r.Discard(length - 1)
+ if err != nil {
+ return err
+ }
+
+ length = 1
+ continue
+ }
+ if err != nil {
+ return err
+ }
+
+ break
}
- b := bs[0]
+ b := bs[length-1]
if inString {
if b == '"' && prev != '\\' {
@@ -193,7 +208,6 @@
prev = b
}
- r.Discard(1)
continue
}
@@ -219,11 +233,11 @@
// Closing quote case handled elsewhere, above
}
- r.Discard(1)
prev = b
}
- return nil
+ _, err = r.Discard(length)
+ return err
}
func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) {
Comment out the pkg/profile
bits (profiling slows the whole thing down), rebuild, and rerun:
$ hyperfine --warmup 2 \
"cat large-file.json | ./control/control '.repo.url' > control.test" \
"cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
"cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
Time (mean ± σ): 302.0 ms ± 4.2 ms [User: 287.7 ms, System: 49.7 ms] Range (min … max): 296.6 ms … 308.2 ms 10 runs
Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
Time (mean ± σ): 215.0 ms ± 1.6 ms [User: 189.1 ms, System: 46.9 ms]
Range (min … max): 213.5 ms … 218.7 ms 13 runs
Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
Time (mean ± σ): 355.7 ms ± 1.4 ms [User: 349.9 ms, System: 26.4 ms]
Range (min … max): 354.3 ms … 359.1 ms 10 runs
Summary
'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
1.40 ± 0.02 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
1.65 ± 0.01 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'
Great! We've shaved off another 40ms. Let's enable profiling, re-run the program and go back into the pprof REPL.
$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 03:12:07 profile: cpu profiling enabled, /tmp/profile2229743747/cpu.pprof
2022/07/11 03:12:07 profile: cpu profiling disabled, /tmp/profile2229743747/cpu.pprof
$ go tool pprof /tmp/profile2229743747/cpu.pprof
File: jqgo
Type: cpu
Time: Jul 11, 2022 at 3:12am (UTC)
Duration: 401.33ms, Total samples = 210ms (52.33%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 210ms, 100% of 210ms total
Showing top 10 nodes out of 20
flat flat% sum% cum cum%
100ms 47.62% 47.62% 180ms 85.71% main.(*jsonReader).eatValue
70ms 33.33% 80.95% 70ms 33.33% bufio.(*Reader).Peek
10ms 4.76% 85.71% 10ms 4.76% encoding/json.(*encodeState).string
10ms 4.76% 90.48% 20ms 9.52% main.(*jsonReader).expectString
10ms 4.76% 95.24% 10ms 4.76% main.(*jsonReader).readByte
10ms 4.76% 100% 10ms 4.76% reflect.Value.Type
0 0% 100% 10ms 4.76% encoding/json.(*Encoder).Encode
0 0% 100% 10ms 4.76% encoding/json.(*decodeState).literalStore
0 0% 100% 10ms 4.76% encoding/json.(*decodeState).unmarshal
0 0% 100% 10ms 4.76% encoding/json.(*decodeState).value
Nice, syscall.Syscall
is no longer in the top 10. But eatValue
is
and we're still spending a bunch of time in Peek
. We didn't try to
stop calling Peek
so much, we just cut down on calling Discard
.
List eatValue
.
(pprof) list eatValue
Total: 210ms
ROUTINE ======================== main.(*jsonReader).eatValue in /home/phil/tmp/jqgo/mainpeek.go
100ms 180ms (flat, cum) 85.71% of Total
. . 159: err := jr.eatWhitespace(r)
. . 160: if err != nil {
. . 161: return err
. . 162: }
. . 163:
. 20ms 164: ok, _, err := jr.tryScalar(r)
. . 165: if err != nil {
. . 166: return err
. . 167: }
. . 168:
. . 169: // It was a scalar, we're done!
. . 170: if ok {
. . 171: return nil
. . 172: }
. . 173:
. . 174: // Otherwise it's an array or object
. . 175: length := 0
. . 176: first := true
. . 177: var bs []byte
20ms 20ms 178: for first || len(stack) > 0 {
10ms 10ms 179: length++
. . 180: first = false
. . 181:
. . 182: for {
20ms 80ms 183: bs, err = r.Peek(length)
. . 184: if err == bufio.ErrBufferFull {
. . 185: _, err = r.Discard(length - 1)
. . 186: if err != nil {
. . 187: return err
. . 188: }
. . 189:
. . 190: length = 1
. . 191: continue
. . 192: }
. . 193: if err != nil {
. . 194: return err
. . 195: }
. . 196:
. . 197: break
. . 198: }
10ms 10ms 199: b := bs[length-1]
. . 200:
10ms 10ms 201: if inString {
10ms 10ms 202: if b == '"' && prev != '\\' {
203: inString = false
. . 204: }
. . 205:
. . 206: // Two \\-es cancel eachother out
20ms 20ms 207: if b == '\\' && prev == '\\' {
. . 208: prev = byte(0)
. . 209: } else {
. . 210: prev = b
. . 211: }
.
The bulk of time is spent in Peek
. Let's list Peek
again.
(pprof) list Peek
Total: 210ms
ROUTINE ======================== bufio.(*Reader).Peek in /usr/local/go/src/bufio/bufio.go
70ms 70ms (flat, cum) 33.33% of Total
. . 130:// also returns an error explaining why the read is short. The error is
. . 131:// ErrBufferFull if n is larger than b's buffer size.
. . 132://
. . 133:// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding
. . 134:// until the next read operation.
10ms 10ms 135:func (b *Reader) Peek(n int) ([]byte, error) {
. . 136: if n < 0 {
. . 137: return nil, ErrNegativeCount
. . 138: }
. . 139:
. . 140: b.lastByte = -1
. . 141: b.lastRuneSize = -1
. . 142:
10ms 10ms 143: for b.w-b.r < n && b.w-b.r < len(b.buf) && b.err == nil {
. . 144: b.fill() // b.w-b.r < len(b.buf) => buffer is not full
. . 145: }
. . 146:
. . 147: if n > len(b.buf) {
. . 148: return b.buf[b.r:b.w], ErrBufferFull
. . 149: }
. . 150:
. . 151: // 0 <= n <= len(b.buf)
. . 152: var err error
. . 153: if avail := b.w - b.r; avail < n {
. . 154: // not enough data in buffer
. . 155: n = avail
. . 156: err = b.readErr()
. . 157: if err == nil {
. . 158: err = ErrBufferFull
. . 159: }
. . 160: }
50ms 50ms 161: return b.buf[b.r : b.r+n], err
. . 162:}
. . 163:
. . 164:// Discard skips the next n bytes, returning the number of bytes discarded.
. . 165://
. . 166:// If Discard skips fewer than n bytes, it also returns an error.
Well it's not really clear to me from this why we spend so much time slicing here.
We might be able to use Peek
much less if we kept our own FIFO queue
of peeked-at bytes. But I don't feel like writing a correct, efficient
FIFO queue (a ring buffer, basically) and maybe there are other
aspects of this program we can look at. So let's give this train of
thought a break.
Memory profiling
Let's change tactics entirely. Memory allocation tends to be expensive. Allocating in a loop is generally a bad idea. And this entire program is a loop. So let's try doing a memory profile instead of a CPU profile.
Instead of defer profile.Start().Stop()
we'll set defer
profile.Start(profile.MemProfile).Stop()
.
Build, rerun and enter pprof with the -alloc_space
flag. We want to
see where memory is being allocated.
$ go build
$ cat large-file.json | ./jqgo '.repo.url' > /dev/null
2022/07/11 03:24:55 profile: memory profiling enabled (rate 4096), /tmp/profile1407859643/mem.pprof
2022/07/11 03:24:56 profile: memory profiling disabled, /tmp/profile1407859643/mem.pprof
$ go tool pprof -alloc_objects /tmp/profile1407859643/mem.pprof
File: jqgo
Type: alloc_objects
Time: Jul 11, 2022 at 3:24am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 365899, 99.95% of 366086 total
Dropped 24 nodes (cum <= 1830)
Showing top 10 nodes out of 14
flat flat% sum% cum cum%
227585 62.17% 62.17% 262708 71.76% main.(*jsonReader).expectString
40945 11.18% 73.35% 40945 11.18% main.(*jsonReader).readByte
39500 10.79% 84.14% 252585 69.00% main.(*jsonReader).tryScalar
30009 8.20% 92.34% 41924 11.45% main.(*jsonReader).tryNumber
12055 3.29% 95.63% 215416 58.84% main.(*jsonReader).eatValue
7555 2.06% 97.70% 11915 3.25% encoding/json.Unmarshal
4360 1.19% 98.89% 4360 1.19% encoding/json.(*decodeState).literalStore
3847 1.05% 99.94% 3847 1.05% main.(*jsonReader).expectIdentifier
43 0.012% 99.95% 365931 100% runtime.main
0 0% 99.95% 4360 1.19% encoding/json.(*decodeState).unmarshal
And just like in the CPU profile we can list functions to see where
the allocations happen in code. Let's list the biggest memory user
here, expectString
.
(pprof) list expectString
Total: 366086
ROUTINE ======================== main.(*jsonReader).expectString in /home/phil/tmp/jqgo/mainpeek.go
227585 262708 (flat, cum) 71.76% of Total
. . 58: err := jr.eatWhitespace(r)
. . 59: if err != nil {
. . 60: return "", err
. . 61: }
. . 62:
. 4941 63: b, err := jr.readByte(r)
. . 64: if err != nil {
. . 65: return "", err
. . 66: }
. . 67:
. . 68: if b != '"' {
. . 69: return "", fmt.Errorf("Expected double quote to start string, got: '%s'", string(b))
. . 70: }
. . 71:
. . 72: var prev byte
. . 73: for {
. 30182 74: b, err := jr.readByte(r)
. . 75: if err != nil {
. . 76: return "", err
. . 77: }
. . 78:
. . 79: if b == '\\' && prev == '\\' {
. . 80: // Just skip
. . 81: prev = byte(0)
. . 82: continue
. . 83: } else if b == '"' {
. . 84: // Overwrite the escaped double quote
. . 85: if prev == '\\' {
. . 86: s[len(s)-1] = '"'
. . 87: } else {
. . 88: // Otherwise it's the actual end
. . 89: break
. . 90: }
. . 91: }
. . 92:
146302 146302 93: s = append(s, b)
. . 94: prev = b
. . 95: }
. . 96:
81283 81283 97: return string(s), nil
. . 98:}
. . 99:
. . 100:func (jr *jsonReader) expectIdentifier(r *bufio.Reader, ident string, value any) (any, error) {
. . 101: var s []byte
. . 102:
And the biggest offender is growing the string! The good thing is that
growing this string can be amortized because we can share the
underlying string memory across calls on the jsonResponse
struct. This way, expectString
only needs to grow the string when it
actually sees a bigger string than we've already seen.
The builtin bytes.Buffer type does
exactly this. We can put a bytes.Buffer
on the jsonResponse
struct
because this code isn't multithreaded and because expectString
doesn't call itself.
@@ -2,6 +2,7 @@
import (
"bufio"
+ "bytes"
"encoding/json"
"fmt"
"io"
@@ -13,6 +14,8 @@
type jsonReader struct {
read []byte
+
+ expectString_buffer bytes.Buffer
}
func (jr *jsonReader) reset() {
@@ -51,7 +54,7 @@
}
func (jr *jsonReader) expectString(r *bufio.Reader) (string, error) {
- var s []byte
+ jr.expectString_buffer.Reset()
err := jr.eatWhitespace(r)
if err != nil {
@@ -81,18 +84,18 @@
} else if b == '"' {
// Overwrite the escaped double quote
if prev == '\\' {
- s[len(s)-1] = '"'
+ jr.expectString_buffer.Bytes()[jr.expectString_buffer.Len()-1] = '"'
} else {
// Otherwise it's the actual end
break
}
}
- s = append(s, b)
+ jr.expectString_buffer.WriteByte(b)
prev = b
}
- return string(s), nil
+ return jr.expectString_buffer.String(), nil
}
Or instead of sharing memory on the struct, maybe this would be a good place to use sync.Pool?
Disable pkg/profile
, build and rerun with hyperfine.
$ hyperfine --warmup 2 \
"cat large-file.json | ./control/control '.repo.url' > control.test" \
"cat large-file.json | ./jqgo '.repo.url' > jqgo.test" \
"cat large-file.json | jq '.repo.url' > jq.test"
Benchmark 1: cat large-file.json | ./control/control '.repo.url' > control.test
Time (mean ± σ): 307.2 ms ± 10.8 ms [User: 292.8 ms, System: 49.4 ms]
Range (min … max): 296.5 ms … 326.2 ms 10 runs
Benchmark 2: cat large-file.json | ./jqgo '.repo.url' > jqgo.test
Time (mean ± σ): 210.8 ms ± 2.2 ms [User: 185.4 ms, System: 44.9 ms]
Range (min … max): 209.1 ms … 216.8 ms 14 runs
Benchmark 3: cat large-file.json | jq '.repo.url' > jq.test
Time (mean ± σ): 356.1 ms ± 2.6 ms [User: 349.1 ms, System: 26.9 ms]
Range (min … max): 354.1 ms … 362.9 ms 10 runs
Summary
'cat large-file.json | ./jqgo '.repo.url' > jqgo.test' ran
1.46 ± 0.05 times faster than 'cat large-file.json | ./control/control '.repo.url' > control.test'
1.69 ± 0.02 times faster than 'cat large-file.json | jq '.repo.url' > jq.test'
And we've shaved another 20ms off. That's not bad!
Coming to a close
There is more we could do but this is a long post already.
For example, in the project repo I also built a generic vector
type with a
pop operation that is used for the stack in the eatValue
function. It is shared on the jsonReader
instance like the
expectString
buffer. This ended up shaving another 20ms. And I also
got rid of most conversions from []byte
to string
(which is an
expensive allocation you may notice listed as bytes.String()
in the
top10
of -alloc_objects
if you run the profiler again now.)
But hopefully you're getting the gist of how you might investigate CPU and memory usage. For me it's still a lot of poking around and trying different things. But after a few years of trying to get better at profiling Go programs I think I'm starting to get the hang of it.
I wrote a new blog post on implementing a simple jq clone from scratch in Go. This post explores partial/fuzzy parsing again and finishes with my approach to debugging memory/CPU usage in Go programs. It's a bit of a long post but hopefully worthwhile! :)https://t.co/DxilIVaUBa pic.twitter.com/as3Sr5I2G0
— Phil Eaton (@phil_eaton) July 11, 2022