Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove runit from supervisor #316

Merged
merged 1 commit into from
Mar 25, 2016
Merged

remove runit from supervisor #316

merged 1 commit into from
Mar 25, 2016

Conversation

bookshelfdave
Copy link
Contributor

This PR removes the use of runit as a process supervisor.

Note, a separate PR will remove references to runit in plans + packaging scripts.

Processes are started via the /opt/bldr/srvc/<package>/run script instead of via runsv. Additionally, the child PID is stored in /opt/bldr/srvc/<package>/PID. The PIDfile allows code outside of the worker (specifically, the sidecar) to send signals and check status of a package without knowing it's pid. The tradeoff here is that you have to open and read the PIDfile. Note that the worker doesn't need to check the pidfile on every iteration of topology::run_internal, as the pid is stashed in the worker state.

The topology::run_internal loop changes slightly, as we have to restart a child if we detect a failure instead of letting runit do that for us.

Signals are passed through bldr via the kill syscall. The two exceptions are SIGINT and SIGTERM, which when caught, kill the child (permanently) and exit the supervisor. Note, the child could still directly receive SIGINT or SIGTERM, in which case, it will exit and be restarted by the supervisor.

The create time of the PIDfile is used to determine uptime for a child outside the scope of the worker, instead of creating a second file to store this timestamp. Status is mainly called from the sidecar, which doesn't have access to the worker state, thus the use of the PIDfile create time. It's a simple but effective hack.

Note: there are 2 test failures that appear to be unrelated to this PR (they fail on master as well):

test output::tests::format_verbose ... FAILED
test output::tests::format_verbose_color ... FAILED

TODO:

  • wait for child during supervisor shutdown, then kill -9 after a timeout
  • test pkg upgrade restarts
  • test gossip change restarts

@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

@bookshelfdave bookshelfdave force-pushed the dp_replace_runit branch 2 times, most recently from 3d9b855 to 03f1ea5 Compare March 24, 2016 21:26
@bookshelfdave
Copy link
Contributor Author

refactored and squashed, anyone up for a review?

@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

1 similar comment
@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

/// Create a pid file for a package
/// The existence of this file does not guarantee that a
/// process exists at the PID contained within.
pub fn create_pidfile(&self, pid: u32) -> BldrResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually this would be interesting to see as a Struct that implements Drop - that way you would never have to worry about cleaning it up in the normal case, since the cleanup logic would always get called. I don't think we have to do this to accept the PR, as it may literally never be an issue.

@adamhjk
Copy link
Contributor

adamhjk commented Mar 25, 2016

gif-keyboard-12634074679764794806

This looks good to me.

match fs::remove_file(pid_file) {
Ok(_) => {
debug!("Removed pid file");
()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to explicitly return a nul type here because ending a statement with ; does the same thing. In Rust when you leave ; off the last statement you are saying "return the value of the statement". When you terminate the last statement with ; you are saying "return null"

@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

@@ -469,7 +473,7 @@ impl Package {
}
}

/// Run reconfigure hook if present
/// Run reconfigure hook if present, DOES NOT restart the package
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not restart the package anymore becuase it's done somewhere else on reconfigure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the state machine is changed outside of this function via worker.return_state = Some(State::Starting);

@bookshelfdave
Copy link
Contributor Author

review updates + squash, will merge when Delivery is green

@chef-delivery
Copy link
Contributor

This PR has passed 'Verify' and is ready for review and approval!
Use: '@delivery approve' when code review is complete.

@bookshelfdave
Copy link
Contributor Author

@delivery approve

@chef-delivery chef-delivery merged commit 021bec8 into master Mar 25, 2016
@chef-delivery
Copy link
Contributor

Change: 124a8648-9a8a-48fe-93a4-325ee28f8a5c approved by: @metadave

@chef-delivery chef-delivery deleted the dp_replace_runit branch March 25, 2016 21:22
@chef-delivery
Copy link
Contributor

This PR has passed 'Acceptance' and is ready to be delivered!
Use: '@delivery deliver' when validated in acceptance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants