Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use MOVED error type for moved replies #4125

Merged
merged 6 commits into from
Nov 14, 2024
Merged

Conversation

chakaz
Copy link
Collaborator

@chakaz chakaz commented Nov 13, 2024

The problem:

When in cluster mode, MOVED replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in # Errorstats as such. For example, in a server that does not own 8k slots, we will aggregate 8k different errors, and their counts (in memory).

This slows down all INFO replies, takes a lot of memory, and also makes INFO replies very long.

The fix:

Use type MOVED for moved replies, making them all the same under # Errorstats

Fixes #4118

@@ -1279,7 +1279,7 @@ void BZPopMinMax(CmdArgList args, Transaction* tx, SinkReplyBuilder* builder,
case OpStatus::TIMED_OUT:
return rb->SendNullArray();
case OpStatus::KEY_MOVED: {
auto error = cluster::SlotOwnershipErrorStr(*tx->GetUniqueSlotId());
auto error = cluster::SlotOwnershipError(*tx->GetUniqueSlotId());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to this change)
must say I do not understand the optional return type here. Why do we check for cluster config inside SlotOwnershipError and return optional if KEY_MOVED can only be due to cluster config. better CHECK for cluster config there and remove optional and remove CHECKs like CHECK(error.has_value());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the reason is to have the if-cluster-enabled and also if-slot-owned in the same place, and reuse it many times (currently 3 times)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this logic I didn't want to include additional headers into cluster_defs file, because it is included quite often, so created a string instead of ErrorReply. But if you want to return Reply we can remove the optional, because ErrorReply contains OK status. Also I think Roman is right and we can remove checking for cluster_config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the return type to ErrorReply, feel free to follow up on the cluster config check :)

@chakaz chakaz requested a review from BorysTheDev November 14, 2024 10:07
@chakaz chakaz merged commit 1513134 into main Nov 14, 2024
12 checks passed
@chakaz chakaz deleted the chakaz/moved-error-stats branch November 14, 2024 11:12
romange pushed a commit that referenced this pull request Nov 20, 2024
**The problem:**

When in cluster mode, `MOVED` replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in `# Errorstats` as such. For example, in a server that does _not_ own 8k slots, we will aggregate 8k different errors, and their counts (in memory).

This slows down all `INFO` replies, takes a lot of memory, and also makes `INFO` replies very long.

**The fix:**

Use `type` `MOVED` for moved replies, making them all the same under `# Errorstats`

Fixes #4118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many MOVED error in # Errorstats section of INFO ALL
3 participants