Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replica): Support FlushDB command for replication #580 #591

Merged
merged 4 commits into from
Dec 25, 2022

Conversation

adiholden
Copy link
Collaborator

No description provided.

Signed-off-by: adi_holden <[email protected]>
Signed-off-by: adi_holden <[email protected]>
}

// Multi shard command flow:
// step 1: Fiber wait untill all the fibers that should execute this tranaction got
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untill -> until here and below

multi_shard_exe_->map_mu.lock();
auto [it, was_insert] = multi_shard_exe_->tx_sync_execution.emplace(entry.txid, entry.shard_cnt);

// Note: we must release the mutex befor calling wait on berrier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

berrier -> barrier

<< " was_insert: " << was_insert;

// step 1
it->second.berrier.wait();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

// Note: erase from map can be done only after all fibers returned from wait.
// The last fiber which will decrease the counter to 0 will be the one to erase the data from map
auto val = --it->second.counter;
VLOG(2) << "txid: " << entry.txid << " unique_shard_cnt_: " << entry.shard_cnt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. since you move the entry above, reading its fields can technically show wrong values.
  2. You do not use was_inserted to check that only a single fiber called flushdb. why?
  3. Please use fetch_sub when decrementing atomics. you will need to compare val with 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can relax the interface on Execution to accept a mut-ref. We need mut because of the mutable CmdArgList in Dispatch (which can be changed with ToUpper etc inside the command)
  2. Because we can't tell now what kind of command that is - sharded part or global. We can, theoretically, check with the number of arguments it has or pass additional info. For now, the other executions on FLUSHDB will be no-ops (but redundant transactions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding number 2, I will but not in this PR.
There are 2 TODO notes which I added.
I will work on another PR to support the execution of the commands by a single fiber, which will require different flow for global commands and non global commands.
I will work on another PR to support the cancellation flow, once the PR I created for heilo with barrier will be merged

Comment on lines 96 to 97

std::unordered_map<TxId, TxExecutionSync> tx_sync_execution;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not interleave members and struct declarations, so all fields at the bottom and all decls at the top - easier to read

@@ -80,12 +84,28 @@ class Replica {
void DefaultErrorHandler(const GenericError& err);

private: /* Main dlfly flow mode functions */
struct MultiShardExecution {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this struct declaration to the top to the other private section

Comment on lines 107 to 108
std::shared_ptr<MultiShardExecution> multi_shard_exe_;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to the bottom

Comment on lines 144 to 146

void ExecuteEntry(JournalExecutor* executor, journal::ParsedEntry&& entry);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs to the flow section (below StableSyncDflyFb)

Having a correctly structured header makes the code a lot easier to understand

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did all the other changes in header apart from this one. The flow section is public and this function is private

// Note: erase from map can be done only after all fibers returned from wait.
// The last fiber which will decrease the counter to 0 will be the one to erase the data from map
auto val = --it->second.counter;
VLOG(2) << "txid: " << entry.txid << " unique_shard_cnt_: " << entry.shard_cnt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can relax the interface on Execution to accept a mut-ref. We need mut because of the mutable CmdArgList in Dispatch (which can be changed with ToUpper etc inside the command)
  2. Because we can't tell now what kind of command that is - sharded part or global. We can, theoretically, check with the number of arguments it has or pass additional info. For now, the other executions on FLUSHDB will be no-ops (but redundant transactions)

Comment on lines 727 to 729
// By step 1 we enforce that replica will execute multi shard commands that finished on master
// Step 3 ensures the correctness of flushall/flushdb commands
// TODO: this implemantaion does not support atomicity in replica
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets divide those parts and indent the steps, its easier to read

Signed-off-by: adi_holden <[email protected]>
@adiholden adiholden merged commit 5d39521 into main Dec 25, 2022
@romange romange deleted the support_flush_db_for_replication branch December 25, 2022 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants