Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce heap memory usage #31

Merged
merged 16 commits into from
Apr 6, 2023
Merged

Reduce heap memory usage #31

merged 16 commits into from
Apr 6, 2023

Conversation

wojas
Copy link
Member

@wojas wojas commented Mar 17, 2023

Several changes to reduce the maximum heap memory usage of LS.

The biggest change is to switch from a generated gogo-protobuf parser to a custom parser. This new parser uses a light low-level abstraction of the protobuf wire format to avoid allocating a large []KV slice with all items. Every KV item in this slice used 64 bytes, plus reallocation overhead as items are appended, plus (in 0.3.0) the copied key and value data as individual allocs, plus GC overhead. Instead, we now directly parse the protobuf wire data as we loop over the items one by one. We also take snapshots directly to protobuf wire format, without an intermediate slice representation.

We now limit the number of snapshots that can concurrently be held in memory using the new memory_downloaded_snapshots and memory_decompressed_snapshots configuration options. In the past this number could grow up to the number of unique instance names.

We now preallocate byte slices that are large enough to hold all the data that we will generate. This avoids many subsequent slice reallocations and copies to grow a slice, doubling the size every time.

Finally, we immediately force a garbage collection cycle as soon as we have released a large block of memory.

In local testing with 1M test domains and 6M records and snapshots from 3 instances, this reduced the peak RSS from over 10 GB (in v0.3.0) to under 3 GB. With more instances, the savings should be even higher with the new limits in place.

To verify that we are writing and reading the protobuf format correctly, this PR adds tests that perform roundtrips with the old gogo-protobuf code, and verify that the implementation is correct. The proto file will contrinue to be updated to reflect any new keys, and gogo can automatically generate code to fill fields with tests data, and methods that can compare all fields before and after a roundtrip.

Several other tests and benchmarks have been added.

@wojas wojas added this to the v0.4.0 milestone Mar 21, 2023
@wojas wojas force-pushed the experimental-protobuf-dbi-scanner branch from bc765f7 to b2ecdcf Compare March 24, 2023 06:34
@wojas wojas changed the title [WIP] Reduce heap memory usage Reduce heap memory usage Mar 24, 2023
@wojas wojas requested a review from nvaatstra March 24, 2023 06:38
Copy link
Contributor

@nvaatstra nvaatstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; Excellent optimisations. Only a few minor comments.

@@ -32,7 +33,7 @@ func init() {

snapshotsCmd.AddCommand(snapshotsDumpCmd)
snapshotsDumpCmd.Flags().StringP("format", "f", "debug",
"Output format, one of: 'debug' (default), 'text'")
"Output format, one of: 'debug' (default), 'text' (same)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to cater to a future expansion to allow for multiple formats, perhaps have debug as its own flag so you can specify the level of detail (normal, debug) and the format (text, some other format)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this switch was a bit different.

In 0.3.0, you could use two output formats:

  • 'text' is a google protobuf encoding that is reversible to binary, but it makes all the key-values unreadable
  • 'debug' shows the all the key-value entries in a more friendly way

Now that we drop the generated protobuf code, we also loose the 'text' encoder'. I decided to just make this the same as 'debug', instead of dropping the 'text' option, because I figured it would be less likely to break anything.

utils/climit/metrics.go Outdated Show resolved Hide resolved
utils/climit/metrics.go Show resolved Hide resolved
d.r.snapshotsByInstance[d.instance] = snapshot.Update{
Snapshot: msg,
NameInfo: ni,
OnClose: func(u *snapshot.Update) {
if u.Snapshot == nil {
return // already called?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will not be reached, but if it ever does it looks like a potential risk of keeping a token without release()-ing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways we can handle calling Close() twice:

  1. Ignore a second call.
  2. panic()

The advantage of the first is that you can always defer u.Close(), even if you explicitly call it somewhere. Having this option may reduce the chance that you forget to call it in the first place.

The second one us strict about you calling it only once. This may detect incorrect logic in some cases, but actually increases the chance that you forget to call it in some code paths.

In this case I picked option 1, because it seemed like the least bad option.

The bigger issue happens when Close is not called. Perhaps we should add a runtime.SetFinalizer to detect that scenario, and then panic?

wojas added 12 commits April 6, 2023 12:27
Limit number of downloaded and decompressed snapshots waiting to be
loaded, to reduce the memory usage.

To download or decompress a snapshot, you need to hold a token that is
in limited supply, configurable with two new `memory_*` options.
The test was failing with the new memory limit fallback to 1.

This also removes the dependency on magic sleep times, and reduces the
new snapshots check interval for a faster test run.
We do not need this index, it was already commented out.
@wojas wojas force-pushed the experimental-protobuf-dbi-scanner branch from 3413bda to b51c6b8 Compare April 6, 2023 04:32
wojas added 4 commits April 6, 2023 12:36
It should not spell check variable names, this is just too annoying.

(...Spell check...)
Error: snapshot/dbi.go:8:26 ... 33, Error - `csproto` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:28:25 ... 33, Error - `reallocs` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:117:44 ... 51, Error - `varints` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi_test.go:36:2 ... 8, Error - `pbdata` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi_test.go:88:55 ... 63, Error - `Mentries` is not a recognized word. (unrecognized-spelling)
Error: snapshot/doc.go:4:40 ... 55, Error - `deserialisation` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/compat_test.go:1:9 ... 21, Error - `gogosnapshot` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/compat_test.go:15:2 ... 6, Error - `popr` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:1005:2 ... 6, Error - `tmps` is not a recognized word. (unrecognized-spelling)
Error: syncer/receiver/receiver.go:11:47 ... 53, Error - `climit` is not a recognized word. (unrecognized-spelling)
Error: syncer/sync_test.go:163:38 ... 44, Error - `dbimsg` is not a recognized word. (unrecognized-spelling)
Error: syncer/utils.go:100:28 ... 40, Error - `optimisation` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:122:13 ... 20, Error - `csproto` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:122:57 ... 64, Error - `csproto` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:123:13 ... 20, Error - `csproto` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi.go:127:13 ... 20, Error - `csproto` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi_test.go:38:27 ... 33, Error - `pbdata` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi_test.go:43:24 ... 30, Error - `pbdata` is not a recognized word. (unrecognized-spelling)
Error: snapshot/dbi_test.go:106:55 ... 63, Error - `Mentries` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/compat_test.go:16:31 ... 35, Error - `popr` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/regenerate.go:1:9 ... 21, Error - `gogosnapshot` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:4:9 ... 21, Error - `gogosnapshot` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:321:34 ... 46, Error - `gogosnapshot` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:322:35 ... 47, Error - `gogosnapshot` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:1007:3 ... 7, Error - `tmps` is not a recognized word. (unrecognized-spelling)
Error: snapshot/gogosnapshot/snapshot.pb.go:1009:16 ... 20, Error - `tmps` is not a recognized word. (unrecognized-spelling)
Error: snapshot/load.go:21:52 ... 60, Error - `reallocs` is not a recognized word. (unrecognized-spelling)
Error: snapshot/load.go:49:52 ... 60, Error - `reallocs` is not a recognized word. (unrecognized-spelling)
Error: snapshot/load_test.go:33:45 ... 53, Error - `Mentries` is not a recognized word. (unrecognized-spelling)
Error: syncer/receiver/receiver.go:36:30 ... 36, Error - `climit` is not a recognized word. (unrecognized-spelling)
Error: syncer/receiver/receiver.go:41:26 ... 32, Error - `climit` is not a recognized word. (unrecognized-spelling)
Error: syncer/receiver/receiver.go:79:29 ... 35, Error - `climit` is not a recognized word. (unrecognized-spelling)
Error: syncer/receiver/receiver.go:80:29 ... 35, Error - `climit` is not a recognized word. (unrecognized-spelling)
Error: syncer/utils.go:116:37 ... 45, Error - `reallocs` is not a recognized word. (unrecognized-spelling)
(...Compare expect with new output...)
@wojas wojas merged commit bd67d1b into main Apr 6, 2023
@wojas wojas deleted the experimental-protobuf-dbi-scanner branch April 6, 2023 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants