Backport of snapshot restore-from-archive streaming and filtering into release/1.3.x #14243
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #13658 to be assessed for backporting due to the inclusion of the label backport/1.3.x.
The below text is copied from the body of the original PR.
This changeset implements two improvements to restoring FSM snapshots from archives:
RestoreFromArchive
helper. The operator can pass these as-filter
arguments tonomad operator snapshot state
(and other commands in the future) to include only desired data when reading the snapshot.Deferred for this PR: the
nomad operator snapshot state
command still has to load everything that's been filtered into the FSM before writing it out to a large JSON blob. We should provide a tool that streams the decoded objects directly to an encoder without loading into the FSM, so that we can emit NDJSON, write out to a sqlite DB, etc.Example:
Starting with a 439MB snapshot (~13GiB uncompressed), I want to filter for all objects associated with 3 different jobs and 3 different nodes:
Previously this would write ~13GiB to disk, read 14GiB from disk, and saturate 1 core for over an hour before running out of memory on my machine (16GiB) and crashing.
With this change, the command reads ~450MiB from disk, only writes the 197MiB JSON blob to disk, and uses about 150% CPU, maxing out memory usage around 330MB.