Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid config change to be lost. #36

Merged
merged 2 commits into from
Aug 30, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions include/libnuraft/raft_params.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ struct raft_params {
, custom_election_quorum_size_(0)
, leadership_expiry_(0)
, auto_forwarding_(false)
, interval_to_wait_previous_config_change_finish_(100)
, return_method_(blocking)
{}

Expand Down Expand Up @@ -363,6 +364,16 @@ public:
// Otherwise, it will return error to client immediately.
bool auto_forwarding_;

// Although we have config_changing_ to try to prevent
// config changing happen simultaneously, it is still
// possible that two add/remove server actions trigger
// to change the config based on the same original config.
// These two actions will append two conf log entry. When
// committing the second conf log entry, the configure
// change in the first conf log entry will lost. So we
// should prevent config change when config_changing_ is true.
int32 interval_to_wait_previous_config_change_finish_;

// To choose blocking call or asynchronous call.
return_method_type return_method_;
};
Expand Down
22 changes: 22 additions & 0 deletions src/handle_join_leave.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,17 @@ void raft_server::sync_log_to_new_srv(ulong start_idx) {
gap, quick_commit_index_.load(), start_idx,
params->log_sync_stop_gap_ );

// See comment for raft_params::interval_to_wait_previous_config_change_finish_.
int32 count = 0;
while (config_changing_) {
p_in( "Try to add server %d. But now there is a "
"un-committed config change. Should wait. "
"Count: %d.", srv_to_join_->get_id(), count++);
std::this_thread::sleep_for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shigui1989 , sleeping inside Asio worker threads looks to me not good.

How about maintaining intermediate_config_ separate to config_? That contains the latest config that is not committed yet. And we make new_conf based on that intermediate_config_, not config_. Also we need to update intermediate_config_ properly here, and clear it in reconfigure() function maybe. Then we can avoid the situation that multiple uncommitted configs overwrite each other.

Please let me know if you want to make this change by yourself, or I can do it on this PR. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You can help. Please go ahead! Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

( std::chrono::milliseconds
(params->interval_to_wait_previous_config_change_finish_) );
}

ptr<cluster_config> cur_conf = get_config();
ptr<cluster_config> new_conf = cs_new<cluster_config>
( log_store_->next_slot(),
Expand Down Expand Up @@ -405,6 +416,17 @@ void raft_server::rm_srv_from_cluster(int32 srv_id) {
pp->step_down();
}

// See comment for raft_params::interval_to_wait_previous_config_change_finish_.
int32 count = 0;
while (config_changing_) {
p_in( "Try to remove server %d. But now there is a "
"un-committed config change. Should wait. "
"Count: %d.", srv_id, count++);
std::this_thread::sleep_for
( std::chrono::milliseconds
(ctx_->get_params()->interval_to_wait_previous_config_change_finish_) );
}

ptr<cluster_config> cur_conf = get_config();
ptr<cluster_config> new_conf = cs_new<cluster_config>
( log_store_->next_slot(),
Expand Down