-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rdb_load): add support for loading huge sets #3807
Conversation
src/server/rdb_load.cc
Outdated
/* | ||
* TODO | ||
* | ||
* I'm quite confused by the existing logic here, were we limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i remember correctly, it was more efficient to allocate smaller arrays than a single huge array.
And I agree with you - lets start with a single "static" rule of upto kMaxBlobLen for streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure
src/server/rdb_load.cc
Outdated
// Despite being async, this function can block if the shard queue is full. | ||
FlushShardAsync(sid); | ||
} | ||
out_buf.emplace_back(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move(item)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good :)
5abb498
to
1b37a0f
Compare
you can use |
1b37a0f
to
8cb986b
Compare
Sure I've added a unit test Testing a 5GB set manually with 5m entries ( RSS is still 10GB (5GB used), as it still only does a single flush (since kMaxBufSize x kMaxBlobLen > 8m) It's also now ~20% slower which seems to be due to not reserving the full set up front anymore. Though can include the full set size in Repeating the test where we flush if |
I think it's worth reserving capacity if it's easy to add. Also we can reduce now |
Ok sure, I've reduced kMaxBlobLen to 4096 meaning the 5m set is split into ~10 flushes (128x4096 ~= 500k) And reserved the full set up front With that loading the 5GB set takes 2.8s (compared to 3.8s before) with RSS ~7GiB (~10GiB before) |
@@ -2505,6 +2557,11 @@ void RdbLoader::LoadItemsBuffer(DbIndex db_ind, const ItemsBuf& ib) { | |||
stop_early_ = true; | |||
break; | |||
} | |||
if (item->load_config.append) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you add this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a guess given it already does ts->TryStash
below in the non-append case - should i remove?
(Or if you mean why does it continue
early in the append state, it's because we don't want to check expiry, add to db_slice
or update flags in the append case right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove this code.
- Data tiering does not work with to non-strings.
- We do not want to stash values that are not yet finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, plus minor comments
b34c951
to
5696d58
Compare
Good work @andydunstall . If you want you can follow up with hash-map + zset PR as well. |
Thanks will do |
* feat(rdb_load): add support for loading huge sets
Adds support for loading huge sets (#3760)