Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broken memcached error reporting #1741

Merged
merged 13 commits into from
Aug 28, 2023

Conversation

kostasrim
Copy link
Contributor

Partially solves #1730 (because I suspect there is some other issue as well -- either the user it doing something illegal with the replica or there is some other error which is not reported properly because of this bug).

ConnectionContext overloads operator-> by returning a RedisReplyBuilder. This is extremely error prone, because there are execution paths that work both with memcached and redis protocol. If used improperly in a memcached execution path it will trigger an internal assertion Protocol::REDIS == protocol() instead of properly report the error to the client.

This PR addresses this, by adding the SendError family of functions in ConnectionContext. I replace the (*cntx)-> with cntx->SendError which uses the internal SinkReplyBuilder and dispatching to the right function based on the underline protocol.

All in all this PR fixes:

  • DispatchCommand error reporting (one example is when we use SET command on the replica -- previously we crashed now we properly report an error)
  • INCRBY command error reporting (this would crash)
  • DECRBY command error reporting (this would crash)

Basically, I went over all of the memcached commands just to make sure that we report errors properly via the common base client SinkReplyBuilder and not via the specific RedisReplyBuilder class via the overloaded arrow operator.

I really don't like the operator-> as it's extremely error prone and it's really hard to assert that an execution path is redis only (especially when both -> and reply_builder are used within the same function -- it makes you scratch your head if it's a bug or an actual feature).

Generally speaking a long term solution is to replace all of the common functions to strictly use a common interface via SinkReplyBuilder such that are protocol agnostic (since anyway they should be used by both memcached and redis)

@kostasrim kostasrim requested a review from romange August 25, 2023 09:18
@kostasrim kostasrim self-assigned this Aug 25, 2023
@kostasrim kostasrim added the bug Something isn't working label Aug 25, 2023
@kostasrim kostasrim enabled auto-merge (squash) August 25, 2023 09:24
@@ -277,7 +289,7 @@ void RedisReplyBuilder::SendBulkString(std::string_view str) {
return Send(v, ABSL_ARRAYSIZE(v));
}

std::string_view RedisReplyBuilder::StatusToMsg(OpStatus status) {
std::string_view SinkReplyBuilder::StatusToMsg(OpStatus status) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change it? that's not good because these errors are not part of memcached protocol.

Copy link
Contributor Author

@kostasrim kostasrim Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true:

   949     if (cid->IsTransactional()) {       
   950       dist_trans.reset(new Transaction{cid});                                                     
   951                
   952       if (!dist_trans->IsMulti()) {  // Multi command initialize themself based on their mode.
   953         if (auto st = dist_trans->InitByArgs(dfly_cntx->conn_state.db_index, args_no_cmd);        
   954             st != OpStatus::OK)                
   955           return cntx->SendError(st);
   956       }                                                                                           
   957                                                                    
   958       dfly_cntx->transaction = dist_trans.get();
   959       dfly_cntx->last_command_debug.shards_count = dfly_cntx->transaction->GetUniqueShardCnt();   
   960     } else {                                                                                      
   961       dfly_cntx->transaction = nullptr;                                                           
   962     }                                                                        
   963   }  

Andddd

<< CI{"SET", CO::WRITE | CO::DENYOOM | CO::NO_AUTOJOURNAL

So memcached commands are transactional. Same applies for get incr decr etc. If the weren't, our transactional system would be broken, because memcached commands would not be a part of a transaction whereas redis commands would causing all sorts of broken behavior when interleaved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitByArgs always succeeds for these commands.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically validate everything for memcached already at the parser level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really bug prone tbh, some of the error reporting is already handled while some other doesn't(in case of this bug). Anyway, beyond the scope of this PR. I will see at some point how we can simplify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this for now

Copy link
Contributor Author

@kostasrim kostasrim Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no, we kinda need this because we need SendError(ErrorReply) which uses status.

to be fair, it won't ever arrive to that block that uses status so I ended up removing this and added a dcheck for our sanity shake. You never know if we hit that so at least let's report this to be sure.

@romange
Copy link
Collaborator

romange commented Aug 25, 2023

I think we should add a pytest that reproduces the problem.

Comment on lines 63 to 69
void SendError(OpStatus status) {
rbuilder_->SendError(status);
}

void SendSimpleString(std::string_view str) {
rbuilder_->SendSimpleString(str);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day I'll make ErrorReply's constructor implicit and remove all the overloads 😆

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be nice.

@kostasrim kostasrim requested a review from romange August 25, 2023 12:24
@romange
Copy link
Collaborator

romange commented Aug 25, 2023

@kostasrim I added this test to the main branch (locally on my machine) and nothing has crashed.

@kostasrim
Copy link
Contributor Author

@romange Yes because I forgot to check in the changes in __init__. We have another bug there:

105     @property
106     def mc_port(self) -> int:
107         if self.params.existi get(key: _KT) -> Optional[_VT_co]                              
108             return self.param get(key: _KT, default: Union[_VT_co, _T]) -> Union[_VT_co, _T] 
109         return int(self.args.get("mc_port", "11211"))

It works because you are not connecting to the replica. Will push in 5 and you should now get a nice error in main:

pymemcache.exceptions.MemcacheUnexpectedCloseError which is basically the replica crashing....

@@ -106,7 +106,7 @@ def admin_port(self) -> int:
def mc_port(self) -> int:
if self.params.existing_mc_port:
return self.params.existing_mc_port
return int(self.args.get("mc_port", "11211"))
return int(self.args.get("memcached_port", "11211"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange this is why it was passing....

@@ -976,7 +976,7 @@ void StringFamily::IncrBy(CmdArgList args, ConnectionContext* cntx) {
int64_t val;

if (!absl::SimpleAtoi(sval, &val)) {
return (*cntx)->SendError(kInvalidIntErr);
return cntx->SendError(kInvalidIntErr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know it looks fragile but memcache parser already does all the parsing checks because those conditions should not trigger with memcache.

@@ -57,6 +58,10 @@ class SinkReplyBuilder {

virtual void SendProtocolError(std::string_view str) = 0;

// You normally should not call this - maps the status
// into the string that would be sent
static std::string_view StatusToMsg(OpStatus status);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not your fault but those local fixes make the whole thing more confusing.
Let's do the following:

  1. Move StatusToMsg to op_status.h as std::string_view StastusMessage(OpStatus status);
  2. let's make SendError(ErrorReply error) non virtual method in SinkReplyBuilder.

I also changed how SendError is implemented for MC reply builder, I will send you a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make SendError(ErrorReply error) non virtual method in SinkReplyBuilder.

We can't really do that because of void CapturingReplyBuilder::SendError(ErrorReply error) actually overrides it with a different implementation.

Copy link
Contributor

@dranikpg dranikpg Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It overwrites the others as well (string_view), it just becomes a little less efficient because it can't move out of the string IIRC

romange and others added 3 commits August 28, 2023 08:18
* feat: implement CONFIG GET command

The command returns all the matched arguments and their current values.
In addition, this PR adds mutability semantics to each config - whether it can be
changed at runtime.

Fixes #1700

Signed-off-by: Roman Gershman <[email protected]>

---------

Signed-off-by: Roman Gershman <[email protected]>
@@ -251,14 +251,39 @@ TEST_F(RedisReplyBuilderTest, ErrorBuiltInMessage) {
}
}

TEST_F(RedisReplyBuilderTest, ErrorReplyBuiltInMessage) {
Copy link
Contributor Author

@kostasrim kostasrim Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange This is a little "raw" but I wanted to verify that both of the execution paths are covered with ErrorReply without the extra nuances of printing extra error information through the << operator. Moreover, it doesn't cover all sorts possible OpStatus values and the e1, e2 are just placeholders such that I can verify that the right execution path gets triggered

@kostasrim kostasrim requested a review from romange August 28, 2023 17:14
@kostasrim kostasrim enabled auto-merge (squash) August 28, 2023 17:17
@kostasrim kostasrim merged commit 1855c1c into main Aug 28, 2023
@kostasrim kostasrim deleted the fix_broken_memcached_error_reporting branch August 28, 2023 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants