fix: Use `MOVED` error type for moved replies #4125

chakaz · 2024-11-13T08:48:42Z

The problem:

When in cluster mode, MOVED replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in # Errorstats as such. For example, in a server that does not own 8k slots, we will aggregate 8k different errors, and their counts (in memory).

This slows down all INFO replies, takes a lot of memory, and also makes INFO replies very long.

The fix:

Use type MOVED for moved replies, making them all the same under # Errorstats

Fixes #4118

romange · 2024-11-13T09:52:12Z

src/server/zset_family.cc

@@ -1279,7 +1279,7 @@ void BZPopMinMax(CmdArgList args, Transaction* tx, SinkReplyBuilder* builder,
    case OpStatus::TIMED_OUT:
      return rb->SendNullArray();
    case OpStatus::KEY_MOVED: {
-      auto error = cluster::SlotOwnershipErrorStr(*tx->GetUniqueSlotId());
+      auto error = cluster::SlotOwnershipError(*tx->GetUniqueSlotId());


(unrelated to this change)
must say I do not understand the optional return type here. Why do we check for cluster config inside SlotOwnershipError and return optional if KEY_MOVED can only be due to cluster config. better CHECK for cluster config there and remove optional and remove CHECKs like CHECK(error.has_value());

I think that the reason is to have the if-cluster-enabled and also if-slot-owned in the same place, and reuse it many times (currently 3 times)

When I wrote this logic I didn't want to include additional headers into cluster_defs file, because it is included quite often, so created a string instead of ErrorReply. But if you want to return Reply we can remove the optional, because ErrorReply contains OK status. Also I think Roman is right and we can remove checking for cluster_config

I've changed the return type to ErrorReply, feel free to follow up on the cluster config check :)

**The problem:** When in cluster mode, `MOVED` replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in `# Errorstats` as such. For example, in a server that does _not_ own 8k slots, we will aggregate 8k different errors, and their counts (in memory). This slows down all `INFO` replies, takes a lot of memory, and also makes `INFO` replies very long. **The fix:** Use `type` `MOVED` for moved replies, making them all the same under `# Errorstats` Fixes #4118

chakaz added 3 commits November 13, 2024 09:20

Use MOVED error type for moved replies

c9dedfe

refactor

ed541ae

Merge branch 'main' into chakaz/moved-error-stats

8dd09a7

adiholden requested a review from BorysTheDev November 13, 2024 08:50

romange reviewed Nov 13, 2024

View reviewed changes

chakaz added 2 commits November 14, 2024 10:56

fixes

7e35d9b

oops

e4b3f93

BorysTheDev approved these changes Nov 14, 2024

View reviewed changes

Merge branch 'main' into chakaz/moved-error-stats

cb38c0a

chakaz requested a review from BorysTheDev November 14, 2024 10:07

chakaz merged commit 1513134 into main Nov 14, 2024
12 checks passed

chakaz deleted the chakaz/moved-error-stats branch November 14, 2024 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use `MOVED` error type for moved replies #4125

fix: Use `MOVED` error type for moved replies #4125

chakaz commented Nov 13, 2024

romange Nov 13, 2024

chakaz Nov 13, 2024

BorysTheDev Nov 13, 2024

chakaz Nov 14, 2024

fix: Use MOVED error type for moved replies #4125

fix: Use MOVED error type for moved replies #4125

Conversation

chakaz commented Nov 13, 2024

romange Nov 13, 2024

Choose a reason for hiding this comment

chakaz Nov 13, 2024

Choose a reason for hiding this comment

BorysTheDev Nov 13, 2024

Choose a reason for hiding this comment

chakaz Nov 14, 2024

Choose a reason for hiding this comment

fix: Use `MOVED` error type for moved replies #4125

fix: Use `MOVED` error type for moved replies #4125