kube sdnotify: run proxies for the lifespan of the service #16709

vrothberg · 2022-12-02T12:02:35Z

As outlined in #16076, a subsequent BARRIER may follow the READY
message sent by a container. To correctly imitate the behavior of
systemd's NOTIFY_SOCKET, the notify proxies span up by kube play must
hence process messages for the entirety of the workload.

We know that the workload is done and that all containers and pods have
exited when the service container exits. Hence, all proxies are closed
at that time.

The above changes imply that Podman runs for the entirety of the
workload and will henceforth act as the MAINPID when running inside of
systemd. Prior to this change, the service container acted as the
MAINPID which is now not possible anymore; Podman would be killed
immediately on exit of the service container and could not clean up.

The kube template now correctly transitions to in-active instead of
failed in systemd.

Fixes: #16076
Fixes: #16515
Signed-off-by: Valentin Rothberg [email protected]

Does this PR introduce a user-facing change?

Fix a bug where barrier sd-notify messages were ignored when using notify policies in kube-play.

vrothberg · 2022-12-02T12:02:51Z

@alexlarsson PTAL

pkg/systemd/notifyproxy/notifyproxy.go

alexlarsson · 2022-12-02T13:08:04Z

Generally lgtm though

vrothberg · 2022-12-02T13:08:55Z

@containers/podman-maintainers PTAL

vrothberg · 2022-12-02T14:08:11Z

/hold

This needs more work. The barrier is send in a following message:

podman (fix-16515) $ socat -v unix-recvfrom:/tmp/test.sock,fork -
> 2022/12/02 15:06:50.309288  length=7 from=0 to=6
READY=1READY=1> 2022/12/02 15:06:50.309576  length=9 from=0 to=8
BARRIER=1BARRIER=1> 2022/12/02 15:06:55.314936  length=9 from=0 to=8

The flake in containers#16076 is likely related to the notify message not being delivered/read correctly. Move sending the message into an exec session such that flakes will reveal an error message. Signed-off-by: Valentin Rothberg <[email protected]>

vrothberg · 2022-12-06T10:44:47Z

@alexlarsson @umohnani8 @edsantiago PTAL

@umohnani8 using --service-container will now implicitly wait for the container to exit. You could use that in the --wait PR and have a

select {
 case err := <- errorChannelFromKubePlay:
    return err
 case <- signalHandlerChannel:
   return teardown(...)
}

vrothberg · 2022-12-06T12:47:34Z

@rhatdan PTAL

vrothberg · 2022-12-06T12:49:42Z

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

edsantiago

Way over my head, just a few surgical comments. Thanks for addressing this.

test/system/260-sdnotify.bats

pkg/systemd/notifyproxy/notifyproxy.go

test/system/260-sdnotify.bats

As outlined in containers#16076, a subsequent BARRIER *may* follow the READY message sent by a container. To correctly imitate the behavior of systemd's NOTIFY_SOCKET, the notify proxies span up by `kube play` must hence process messages for the entirety of the workload. We know that the workload is done and that all containers and pods have exited when the service container exits. Hence, all proxies are closed at that time. The above changes imply that Podman runs for the entirety of the workload and will henceforth act as the MAINPID when running inside of systemd. Prior to this change, the service container acted as the MAINPID which is now not possible anymore; Podman would be killed immediately on exit of the service container and could not clean up. The kube template now correctly transitions to in-active instead of failed in systemd. Fixes: containers#16076 Fixes: containers#16515 Signed-off-by: Valentin Rothberg <[email protected]>

edsantiago

Tests LGTM

edsantiago · 2022-12-06T13:38:53Z

test/system/260-sdnotify.bats

+    main_pid=$(awk -F= '{print $2}' <<< ${lines[0]})
+    is "$(</proc/$main_pid/comm)" "podman" "podman is the service mainPID"


nice. thank you.

openshift-ci · 2022-12-06T13:39:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [edsantiago,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2022-12-06T15:14:55Z

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

Done. I don't want to submit it as a PR, because this is a bug week, but I'll submit it over the weekend. (Unless you need the magic BARRIER functionality right now; if you do, LMK and we'll coordinate).

vrothberg · 2022-12-06T15:52:32Z

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

Done. I don't want to submit it as a PR, because this is a bug week, but I'll submit it over the weekend. (Unless you need the magic BARRIER functionality right now; if you do, LMK and we'll coordinate).

Thank you, Ed! It's not that urgent. I can prepare a follow-up PR next week.

vrothberg · 2022-12-07T07:25:50Z

@containers/podman-maintainers PTAL

Luap99 · 2022-12-07T11:51:48Z

pkg/systemd/notifyproxy/notifyproxy.go

+	_notifyRcvbufSize = 8 * 1024 * 1024
+	_notifyBufferMax  = 4096
+	_notifyFdMax      = 768
+	_notifyBarrierMsg = "BARRIER=1"
+	_notifyRdyMsg     = daemon.SdNotifyReady


Just a drive by comment, why do the the vars start with an underscore? I feel like this makes reading/writing them harder for no reason?

They are consts and I refrained from capitalizing them (to avoid exporting them) but wanted to somehow discriminate them from ordinary variables.

alexlarsson · 2022-12-07T13:04:59Z

I have some worries with this making the "podman play kube" process stay around for the lifetime of the pod. I think it would be better if this could be handled by conmon or the service container. On the other hand, this is better than the current state, so lets get thisin first and work from here.

So, LGTM.

rhatdan · 2022-12-07T23:06:57Z

I agree, this should be moved to conmon.
/lgtm
/hold cancel

Race introduced in containers#16709, which changed 'top' to 'true', so there was only a narrow window in which '.State.ConmonPod' would be valid. Remove the race. Fixes: containers#17882 Signed-off-by: Ed Santiago <[email protected]>

openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 2, 2022

vrothberg force-pushed the fix-16515 branch from 9d5047f to e953cbf Compare December 2, 2022 12:15

alexlarsson reviewed Dec 2, 2022

View reviewed changes

pkg/systemd/notifyproxy/notifyproxy.go Outdated Show resolved Hide resolved

vrothberg force-pushed the fix-16515 branch from e953cbf to 9d88398 Compare December 2, 2022 13:08

vrothberg marked this pull request as ready for review December 2, 2022 13:08

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2022

vrothberg mentioned this pull request Dec 2, 2022

Various issues in systemd.NotifyProxy #16515

Closed

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2022

vrothberg mentioned this pull request Dec 6, 2022

sdnotify play kube policies: podman container wait, hangs #16076

Closed

vrothberg force-pushed the fix-16515 branch 2 times, most recently from 992f6b4 to 236b58f Compare December 6, 2022 10:41

vrothberg changed the title ~~notifyproxy: handle barrier messages~~ kube sdnotify: run proxies for the lifespan of the service Dec 6, 2022

vrothberg force-pushed the fix-16515 branch from 236b58f to a90a04c Compare December 6, 2022 12:12

edsantiago reviewed Dec 6, 2022

View reviewed changes

vrothberg force-pushed the fix-16515 branch from a90a04c to 4fa307f Compare December 6, 2022 13:23

edsantiago approved these changes Dec 6, 2022

View reviewed changes

Luap99 reviewed Dec 7, 2022

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2022

openshift-ci bot assigned rhatdan Dec 7, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2022

openshift-merge-robot merged commit 6e2e9ab into containers:main Dec 7, 2022

vrothberg deleted the fix-16515 branch December 8, 2022 07:50

tkit1994 mentioned this pull request Feb 3, 2023

[Bug]: podman kube play stays in memory when running systemd service #17345

Closed

edsantiago mentioned this pull request Mar 27, 2023

system tests: fix racey sdnotify test #17944

Merged

edsantiago mentioned this pull request Aug 23, 2023

kube play with sdnotify=conmon: Error: Closing notify proxy: close unixgram: use of closed network connection #19715

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 18, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube sdnotify: run proxies for the lifespan of the service #16709

kube sdnotify: run proxies for the lifespan of the service #16709

vrothberg commented Dec 2, 2022 •

edited

Loading

vrothberg commented Dec 2, 2022

alexlarsson commented Dec 2, 2022

vrothberg commented Dec 2, 2022

vrothberg commented Dec 2, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 6, 2022 •

edited

Loading

edsantiago left a comment

edsantiago left a comment

edsantiago Dec 6, 2022

openshift-ci bot commented Dec 6, 2022

edsantiago commented Dec 6, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 7, 2022

Luap99 Dec 7, 2022

vrothberg Dec 7, 2022

alexlarsson commented Dec 7, 2022

rhatdan commented Dec 7, 2022

		main_pid=$(awk -F= '{print $2}' <<< ${lines[0]})
		is "$(</proc/$main_pid/comm)" "podman" "podman is the service mainPID"

kube sdnotify: run proxies for the lifespan of the service #16709

kube sdnotify: run proxies for the lifespan of the service #16709

Conversation

vrothberg commented Dec 2, 2022 • edited Loading

Does this PR introduce a user-facing change?

vrothberg commented Dec 2, 2022

alexlarsson commented Dec 2, 2022

vrothberg commented Dec 2, 2022

vrothberg commented Dec 2, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 6, 2022 • edited Loading

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago Dec 6, 2022

Choose a reason for hiding this comment

openshift-ci bot commented Dec 6, 2022

edsantiago commented Dec 6, 2022

vrothberg commented Dec 6, 2022

vrothberg commented Dec 7, 2022

Luap99 Dec 7, 2022

Choose a reason for hiding this comment

vrothberg Dec 7, 2022

Choose a reason for hiding this comment

alexlarsson commented Dec 7, 2022

rhatdan commented Dec 7, 2022

vrothberg commented Dec 2, 2022 •

edited

Loading

vrothberg commented Dec 6, 2022 •

edited

Loading