-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt(server): Short-circuit ExecuteAsync(). #1601
Conversation
src/server/transaction.cc
Outdated
EngineShard* shard = EngineShard::tlocal(); | ||
if (unique_shard_cnt_ == 1 && shard != nullptr && shard->shard_id() == unique_shard_id_) { | ||
DVLOG(1) << "Short-circuit ExecuteAsync " << DebugId(); | ||
shard->PollExecution("exec_cb", this); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should move the use_count_ increment below your fast-path (see lines 833 and 890)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes DCHECK(trans != head);
(engine_shard_set.cc:386) to fail in many tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, @dranikpg has a point. I am afraid you have a leak when you increase use_count_ like this,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now I understand the complexity of this change. I think that run_count_ should also move. we need to deep dive into the logic there and it can not be done during these hours :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should carefully look into it because the use_count_ mechanism is a little fragile. But I'm more than sure that just decrementing it (instead of moving the increment below) will work 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't I be calling intrusive_ptr_release(this);
instead of just decrementing, to make sure that it's deleted?
BTW I don't understand how this mechanism works with multi/lua, how are transactions not destroyed after every single operation? We don't use run_count_.fetch_add()
but .store()
instead..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is run_count_
and there is use_count_
. #jobsecurity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe the reasoning in the commit message (what's impacted, why the change etc). The same true for any other non-trivial PR. |
src/server/transaction.cc
Outdated
@@ -844,6 +844,14 @@ void Transaction::ExecuteAsync() { | |||
// IsArmedInShard in other threads. | |||
run_count_.store(unique_shard_cnt_, memory_order_release); | |||
|
|||
EngineShard* shard = EngineShard::tlocal(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please mention that we can not use coordinator_index_
because we may offload this function to run in a different thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using ServerState::tlocal()->thread_index()
is slightly better because you do not need to check for nullptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
With this change, an attempt to execute a hop will short circuit (i.e. run the callback inline instead of a async hop) if it can (i.e. it should run in a single shard which happens to be the current shard).
This change has to potential to reduce latency of execution commands, and can be further utilized in the future for multi/eval if we'll run all commands on the target shard (if it's just one).