-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rdb_load): add support for loading huge streams #3855
Conversation
d22851f
to
4dd427d
Compare
src/server/rdb_load.cc
Outdated
ec_ = RdbError(errc::rdb_file_corrupted); | ||
return; | ||
} | ||
// We only load the stream_trace on the final read, so if not read we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand this comment. Can you explain please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've updated the comment
what I mean is ReadStreams
is split into two sections:
- Reading the stream entries (
ltrace->arr
) - Reading the stream metadata and consumer groups (
ltrace->stream_trace
)
loading the stream metadata and consumer groups in partial reads would be quite complex, and i'm guessing isn't expected to be large enough to require partial reads? so wasn't sure if it's worth trying to load consumer groups in partial reads?
the simplest option seems to be just load the stream entries (ltrace->arr
) in partial reads, then on the final read also read the stream metadata and consumer groups (ltrace->stream_trace
)
4dd427d
to
3bb5db7
Compare
@@ -124,7 +124,15 @@ tuple<const CommandId*, absl::InlinedVector<string, 5>> GeneratePopulateCommand( | |||
} | |||
json[json.size() - 1] = '}'; // Replace last ',' with '}' | |||
args.push_back(json); | |||
} else if (type == "STREAM") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this as it's useful for testing, though it is a bit different from the other populate commands since XADD
adds a single stream entry with multiple elements in that entry (but the key still has only a single entry which is why the test calls populate
2000 times)
Can remove if preferred and just move this logic into the test (though this sped up the test and useful for manual testing)
* chore: remove RdbLoad Ltrace::arr nested vector * feat(rdb_load): add support for loading huge streams
Follows #3850 to add support for loading huge streams (#3760).
This loads the stream entries in partial reads, though loads the stream metadata and consumer groups in a single read (assuming consumer groups will be relatively small so don't need partial reads).
As with lists, loads streams in 512 segments as each stream node can contain 4kb of elements.
Also removes the outer
Ltrace::arr
as we now only use a single array. This meansYieldIfNeeded
is also redundant so removed.Comparing a 5GB stream:
main
: 4.8s / ~13GB RSSload-huge-streams
: 2.6s / ~7GB RSS