-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube sdnotify: run proxies for the lifespan of the service #16709
Conversation
@alexlarsson PTAL |
Generally lgtm though |
@containers/podman-maintainers PTAL |
/hold This needs more work. The barrier is send in a following message:
|
The flake in containers#16076 is likely related to the notify message not being delivered/read correctly. Move sending the message into an exec session such that flakes will reveal an error message. Signed-off-by: Valentin Rothberg <[email protected]>
992f6b4
to
236b58f
Compare
@alexlarsson @umohnani8 @edsantiago PTAL @umohnani8 using --service-container will now implicitly wait for the container to exit. You could use that in the --wait PR and have a select {
case err := <- errorChannelFromKubePlay:
return err
case <- signalHandlerChannel:
return teardown(...)
} |
@rhatdan PTAL |
Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way over my head, just a few surgical comments. Thanks for addressing this.
As outlined in containers#16076, a subsequent BARRIER *may* follow the READY message sent by a container. To correctly imitate the behavior of systemd's NOTIFY_SOCKET, the notify proxies span up by `kube play` must hence process messages for the entirety of the workload. We know that the workload is done and that all containers and pods have exited when the service container exits. Hence, all proxies are closed at that time. The above changes imply that Podman runs for the entirety of the workload and will henceforth act as the MAINPID when running inside of systemd. Prior to this change, the service container acted as the MAINPID which is now not possible anymore; Podman would be killed immediately on exit of the service container and could not clean up. The kube template now correctly transitions to in-active instead of failed in systemd. Fixes: containers#16076 Fixes: containers#16515 Signed-off-by: Valentin Rothberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests LGTM
main_pid=$(awk -F= '{print $2}' <<< ${lines[0]}) | ||
is "$(</proc/$main_pid/comm)" "podman" "podman is the service mainPID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. thank you.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Done. I don't want to submit it as a PR, because this is a bug week, but I'll submit it over the weekend. (Unless you need the magic BARRIER functionality right now; if you do, LMK and we'll coordinate). |
Thank you, Ed! It's not that urgent. I can prepare a follow-up PR next week. |
@containers/podman-maintainers PTAL |
_notifyRcvbufSize = 8 * 1024 * 1024 | ||
_notifyBufferMax = 4096 | ||
_notifyFdMax = 768 | ||
_notifyBarrierMsg = "BARRIER=1" | ||
_notifyRdyMsg = daemon.SdNotifyReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive by comment, why do the the vars start with an underscore? I feel like this makes reading/writing them harder for no reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are consts and I refrained from capitalizing them (to avoid exporting them) but wanted to somehow discriminate them from ordinary variables.
I have some worries with this making the "podman play kube" process stay around for the lifetime of the pod. I think it would be better if this could be handled by conmon or the service container. On the other hand, this is better than the current state, so lets get thisin first and work from here. So, LGTM. |
I agree, this should be moved to conmon. |
Race introduced in containers#16709, which changed 'top' to 'true', so there was only a narrow window in which '.State.ConmonPod' would be valid. Remove the race. Fixes: containers#17882 Signed-off-by: Ed Santiago <[email protected]>
As outlined in #16076, a subsequent BARRIER may follow the READY
message sent by a container. To correctly imitate the behavior of
systemd's NOTIFY_SOCKET, the notify proxies span up by
kube play
musthence process messages for the entirety of the workload.
We know that the workload is done and that all containers and pods have
exited when the service container exits. Hence, all proxies are closed
at that time.
The above changes imply that Podman runs for the entirety of the
workload and will henceforth act as the MAINPID when running inside of
systemd. Prior to this change, the service container acted as the
MAINPID which is now not possible anymore; Podman would be killed
immediately on exit of the service container and could not clean up.
The kube template now correctly transitions to in-active instead of
failed in systemd.
Fixes: #16076
Fixes: #16515
Signed-off-by: Valentin Rothberg [email protected]
Does this PR introduce a user-facing change?