-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove runit from supervisor #316
Conversation
This PR has passed 'Verify' and is ready for review and approval! |
3d9b855
to
03f1ea5
Compare
refactored and squashed, anyone up for a review? |
This PR has passed 'Verify' and is ready for review and approval! |
1 similar comment
This PR has passed 'Verify' and is ready for review and approval! |
6cb51da
to
4ef73a5
Compare
This PR has passed 'Verify' and is ready for review and approval! |
/// Create a pid file for a package | ||
/// The existence of this file does not guarantee that a | ||
/// process exists at the PID contained within. | ||
pub fn create_pidfile(&self, pid: u32) -> BldrResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually this would be interesting to see as a Struct that implements Drop
- that way you would never have to worry about cleaning it up in the normal case, since the cleanup logic would always get called. I don't think we have to do this to accept the PR, as it may literally never be an issue.
match fs::remove_file(pid_file) { | ||
Ok(_) => { | ||
debug!("Removed pid file"); | ||
() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to explicitly return a nul type here because ending a statement with ;
does the same thing. In Rust when you leave ;
off the last statement you are saying "return the value of the statement". When you terminate the last statement with ;
you are saying "return null"
This PR has passed 'Verify' and is ready for review and approval! |
@@ -469,7 +473,7 @@ impl Package { | |||
} | |||
} | |||
|
|||
/// Run reconfigure hook if present | |||
/// Run reconfigure hook if present, DOES NOT restart the package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not restart the package anymore becuase it's done somewhere else on reconfigure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the state machine is changed outside of this function via worker.return_state = Some(State::Starting);
3525f9f
to
7784be6
Compare
review updates + squash, will merge when Delivery is green |
This PR has passed 'Verify' and is ready for review and approval! |
@delivery approve |
Change: 124a8648-9a8a-48fe-93a4-325ee28f8a5c approved by: @metadave |
This PR has passed 'Acceptance' and is ready to be delivered! |
This PR removes the use of runit as a process supervisor.
Note, a separate PR will remove references to runit in plans + packaging scripts.
Processes are started via the
/opt/bldr/srvc/<package>/run
script instead of viarunsv
. Additionally, the child PID is stored in/opt/bldr/srvc/<package>/PID
. The PIDfile allows code outside of the worker (specifically, the sidecar) to send signals and check status of a package without knowing it's pid. The tradeoff here is that you have to open and read the PIDfile. Note that the worker doesn't need to check the pidfile on every iteration oftopology::run_internal
, as the pid is stashed in the worker state.The
topology::run_internal
loop changes slightly, as we have to restart a child if we detect a failure instead of letting runit do that for us.Signals are passed through bldr via the
kill
syscall. The two exceptions areSIGINT
andSIGTERM
, which when caught, kill the child (permanently) and exit the supervisor. Note, the child could still directly receiveSIGINT
orSIGTERM
, in which case, it will exit and be restarted by the supervisor.The create time of the PIDfile is used to determine uptime for a child outside the scope of the worker, instead of creating a second file to store this timestamp. Status is mainly called from the sidecar, which doesn't have access to the worker state, thus the use of the PIDfile create time. It's a simple but effective hack.
Note: there are 2 test failures that appear to be unrelated to this PR (they fail on master as well):
TODO: