-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(server): rdb loader catch bad alloc #1748
Conversation
While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log. Signed-off-by: adi_holden <[email protected]>
if (!added) { | ||
LOG(WARNING) << "RDB has duplicated key '" << item->key << "' in DB " << db_ind; | ||
try { | ||
auto [it, added] = db_slice.AddOrUpdate(db_cntx, item->key, std::move(pv), item->expire_ms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romange I did not fail the loading if we are OOM, just writing the warning to log. Do you think we should fail loading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we are missing a bigger picture here. And I am not saying we should not fix this specific point but the big issue is that both master and replica are configured the same way but the replication fails because we are near the capacity at the master. The master does not fail so why replica fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to @royjacobson point I agree we should fail here: unrolling the operation is better than proceeding with an inconsistent state, but we also should not get to this point at all in this specific scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the changes so that rdb loader load will return error on oom. Note that today if we have error on replica we fail the replica load and on master when using loading from disc we dont fail to start the service if we have errors in rdb loader. If you think we should fail to start the master service on load fail I can do this in a separate PR.
Also I will continue to investigate why replica fails assuming master and replica has the same memory limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean we do not fail when master has errors during load? what happens then? Do we load partial data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Signed-off-by: adi_holden <[email protected]>
While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log and fali loader. Signed-off-by: adi_holden <[email protected]>
While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log.
fixes #1708