-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(replica): Support FlushDB command for replication #580 #591
Conversation
Signed-off-by: adi_holden <[email protected]>
Signed-off-by: adi_holden <[email protected]>
Signed-off-by: adi_holden <[email protected]>
src/server/replica.cc
Outdated
} | ||
|
||
// Multi shard command flow: | ||
// step 1: Fiber wait untill all the fibers that should execute this tranaction got |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untill
-> until
here and below
src/server/replica.cc
Outdated
multi_shard_exe_->map_mu.lock(); | ||
auto [it, was_insert] = multi_shard_exe_->tx_sync_execution.emplace(entry.txid, entry.shard_cnt); | ||
|
||
// Note: we must release the mutex befor calling wait on berrier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
berrier -> barrier
src/server/replica.cc
Outdated
<< " was_insert: " << was_insert; | ||
|
||
// step 1 | ||
it->second.berrier.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
// Note: erase from map can be done only after all fibers returned from wait. | ||
// The last fiber which will decrease the counter to 0 will be the one to erase the data from map | ||
auto val = --it->second.counter; | ||
VLOG(2) << "txid: " << entry.txid << " unique_shard_cnt_: " << entry.shard_cnt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- since you move the entry above, reading its fields can technically show wrong values.
- You do not use
was_inserted
to check that only a single fiber called flushdb. why? - Please use
fetch_sub
when decrementing atomics. you will need to compareval
with 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can relax the interface on Execution to accept a mut-ref. We need mut because of the mutable CmdArgList in Dispatch (which can be changed with ToUpper etc inside the command)
- Because we can't tell now what kind of command that is - sharded part or global. We can, theoretically, check with the number of arguments it has or pass additional info. For now, the other executions on FLUSHDB will be no-ops (but redundant transactions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding number 2, I will but not in this PR.
There are 2 TODO notes which I added.
I will work on another PR to support the execution of the commands by a single fiber, which will require different flow for global commands and non global commands.
I will work on another PR to support the cancellation flow, once the PR I created for heilo with barrier will be merged
src/server/replica.h
Outdated
|
||
std::unordered_map<TxId, TxExecutionSync> tx_sync_execution; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not interleave members and struct declarations, so all fields at the bottom and all decls at the top - easier to read
src/server/replica.h
Outdated
@@ -80,12 +84,28 @@ class Replica { | |||
void DefaultErrorHandler(const GenericError& err); | |||
|
|||
private: /* Main dlfly flow mode functions */ | |||
struct MultiShardExecution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this struct declaration to the top to the other private section
src/server/replica.h
Outdated
std::shared_ptr<MultiShardExecution> multi_shard_exe_; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to the bottom
src/server/replica.h
Outdated
|
||
void ExecuteEntry(JournalExecutor* executor, journal::ParsedEntry&& entry); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This belongs to the flow section (below StableSyncDflyFb)
Having a correctly structured header makes the code a lot easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did all the other changes in header apart from this one. The flow section is public and this function is private
// Note: erase from map can be done only after all fibers returned from wait. | ||
// The last fiber which will decrease the counter to 0 will be the one to erase the data from map | ||
auto val = --it->second.counter; | ||
VLOG(2) << "txid: " << entry.txid << " unique_shard_cnt_: " << entry.shard_cnt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can relax the interface on Execution to accept a mut-ref. We need mut because of the mutable CmdArgList in Dispatch (which can be changed with ToUpper etc inside the command)
- Because we can't tell now what kind of command that is - sharded part or global. We can, theoretically, check with the number of arguments it has or pass additional info. For now, the other executions on FLUSHDB will be no-ops (but redundant transactions)
src/server/replica.cc
Outdated
// By step 1 we enforce that replica will execute multi shard commands that finished on master | ||
// Step 3 ensures the correctness of flushall/flushdb commands | ||
// TODO: this implemantaion does not support atomicity in replica |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets divide those parts and indent the steps, its easier to read
Signed-off-by: adi_holden <[email protected]>
No description provided.