fix(server): rdb loader catch bad alloc #1748

adiholden · 2023-08-27T06:36:30Z

While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log.
fixes #1708

While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log. Signed-off-by: adi_holden <[email protected]>

adiholden · 2023-08-27T06:37:57Z

src/server/rdb_load.cc

-    if (!added) {
-      LOG(WARNING) << "RDB has duplicated key '" << item->key << "' in DB " << db_ind;
+    try {
+      auto [it, added] = db_slice.AddOrUpdate(db_cntx, item->key, std::move(pv), item->expire_ms);


@romange I did not fail the loading if we are OOM, just writing the warning to log. Do you think we should fail loading?

Personally I prefer to fail

I think, we are missing a bigger picture here. And I am not saying we should not fix this specific point but the big issue is that both master and replica are configured the same way but the replication fails because we are near the capacity at the master. The master does not fail so why replica fails?

And to @royjacobson point I agree we should fail here: unrolling the operation is better than proceeding with an inconsistent state, but we also should not get to this point at all in this specific scenario.

I updated the changes so that rdb loader load will return error on oom. Note that today if we have error on replica we fail the replica load and on master when using loading from disc we dont fail to start the service if we have errors in rdb loader. If you think we should fail to start the master service on load fail I can do this in a separate PR.
Also I will continue to investigate why replica fails assuming master and replica has the same memory limit

what do you mean we do not fail when master has errors during load? what happens then? Do we load partial data?

Signed-off-by: adi_holden <[email protected]>

While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log and fali loader. Signed-off-by: adi_holden <[email protected]>

fix(server): rdb loader catch bad alloc

112d8d3

While loading rdb snapshot, if oom is reached a bad alloc exception is thrown. Now we catch it and write warning to log. Signed-off-by: adi_holden <[email protected]>

adiholden commented Aug 27, 2023

View reviewed changes

fail load on oom

0a85b97

Signed-off-by: adi_holden <[email protected]>

romange approved these changes Aug 27, 2023

View reviewed changes

adiholden merged commit 901d3ff into main Aug 27, 2023

adiholden deleted the rdb_load_catch_bad_alloc branch August 27, 2023 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): rdb loader catch bad alloc #1748

fix(server): rdb loader catch bad alloc #1748

adiholden commented Aug 27, 2023 •

edited

Loading

adiholden Aug 27, 2023

royjacobson Aug 27, 2023

romange Aug 27, 2023

romange Aug 27, 2023

adiholden Aug 27, 2023

romange Aug 27, 2023

adiholden Aug 27, 2023

fix(server): rdb loader catch bad alloc #1748

fix(server): rdb loader catch bad alloc #1748

Conversation

adiholden commented Aug 27, 2023 • edited Loading

adiholden Aug 27, 2023

Choose a reason for hiding this comment

royjacobson Aug 27, 2023

Choose a reason for hiding this comment

romange Aug 27, 2023

Choose a reason for hiding this comment

romange Aug 27, 2023

Choose a reason for hiding this comment

adiholden Aug 27, 2023

Choose a reason for hiding this comment

romange Aug 27, 2023

Choose a reason for hiding this comment

adiholden Aug 27, 2023

Choose a reason for hiding this comment

adiholden commented Aug 27, 2023 •

edited

Loading