-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Add socket activation via /run/rpm-ostreed.socket #3874
base: main
Are you sure you want to change the base?
Conversation
This blocks on SELinux policy updates: https://bugzilla.redhat.com/show_bug.cgi?id=2110012 |
9d0f220
to
b1f7105
Compare
OK, I now made this into a build-time option. (Though looking at it, I also need to make installation of the socket unit a build-time option) This will allow us to ship this in F37 now (with a matching specfile conditional) and gain experience with it. Leaving as draft though for now since this is a nontrivial change and could use some more design thought before merging. |
d8f96dd
to
43a5e9f
Compare
/test all |
43a5e9f
to
4a71f22
Compare
Dang it, I did a whole port to https://crates.io/crates/uds before discovering tormol/uds#6 For now...I switched back to using worker threads. That said...hmmm. What would probably make sense here is to factor out an
I guess one can tell that I like |
If we want to support a non-DBus API, one thing we'll likely need to do for now is bridge requests from the tokio thread to the glib/DBus daemon thread. We can do this by acquiring that worker thread's main context and then invoking https://docs.rs/glib/latest/glib/struct.MainContext.html#method.spawn |
424e91f
to
2d70a00
Compare
OK C9S and RHEL9.1 should include the necessary selinux policy changes for this, so I think this can be considered for review and landing. Looks like there's a CI failure in the Jenkins/vmcheck bits I need to dig into. But right now more problematically the socket path isn't being tested by CI at all; need to add a f37 build or extend our c9s build to enable this feature and actually do tests too. |
2d70a00
to
4ef468f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I've been looking in the background at https://github.com/mitsuhiko/tokio-unix-ipc/ and have two issues open, though neither are blocking. |
4ef468f
to
a3cb121
Compare
4837241
to
c80b74f
Compare
6d28576
to
82fd7bb
Compare
40836d7
to
9be6242
Compare
OK I made a few more changes here just to clean things up:
|
9be6242
to
5b01dfe
Compare
f21c9db
to
88db71b
Compare
The jenkins pipeline shows |
I think what's going on here is: We include the .socket unit in the RPM, but it's not enabled by default because it's not in the preset list. This has burned us before. In theory, what we'd need to do is add it to that list. However doing so will dramatically slow down our ability to ship this. So what I'd propose is:
Hmm I think we may also need to add |
88db71b
to
aa48433
Compare
aa48433
to
c589d98
Compare
OK will pick this back up in the background; I made a few more tweaks. Let's see what CI says. |
Thanks for the help! |
For historical reasons, the daemon ends up doing a lot of initialization before even claiming the DBus name. For example, it calls `ostree_sysroot_load()`. We also end up scanning the RPM database, and actually parse all the GPG keys in `/etc/pki/rpm-gpg` etc. This causes two related problems: - By doing all this work before claiming the bus name, we race against the (pretty low) DBus service timeout of 25s. - If something hard fails at initialization, the client can't easily see the error because it just appears in the systemd journal. The client will just see a service timeout. The solution to this is to adopt systemd socket activation. There's a new `rpm-ostreed.socket` and the daemon can be activated that way. The client (when run as root, the socket is only accessible to root right now) connects, which will activate the daemon and attempt initialization - without claiming the DBus name yet. If something goes wrong here, the daemon will reply to the client that activated it with the error, and then also exit with failure. On success, everything continues as normal, including claiming the DBus name. Note that this also logically replaces the code that does explicit `systemctl start rpm-ostreed` invocations. After this patch: ``` $ systemctl stop rpm-ostreed $ umount /boot $ rpm-ostree status error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory ``` Co-authored-by: Colin Walters <[email protected]>
c589d98
to
c993732
Compare
Ok(()) | ||
} | ||
|
||
static SHUTDOWN_SIGNAL: Lazy<Mutex<Option<Sender<()>>>> = Lazy::new(|| Mutex::new(None)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this again and I remembered that there's a race condition this was intended to fix but we didn't. Basically, the socket clients aren't accounted for in our current exit-on-idle logic.
What we need to do is:
- If we have a socket client, then disable the idle exit
- When we choose to idle exit, first shut down the
accept()
handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm and a further complication here is to make this race-free we probably need to use the glib main loop (aka main thread) and not the tokio thread - at least to accept clients on the socket; we could still do the actual processing on the tokio thread with appropriate synchronization.
This is a re-submit of #2932
daemon: Add socket activation via /run/rpm-ostreed.socket
For historical reasons, the daemon ends up doing a lot of
initialization before even claiming the DBus name. For example,
it calls
ostree_sysroot_load()
. We also end up scanningthe RPM database, and actually parse all the GPG keys
in
/etc/pki/rpm-gpg
etc.This causes two related problems:
race against the (pretty low) DBus service timeout of 25s.
easily see the error because it just appears in the systemd
journal. The client will just see a service timeout.
The solution to this is to adopt systemd socket activation.
There's a new
rpm-ostreed.socket
and the daemon can be activatedthat way.
The client (when run as root, the socket is only accessible to root
right now) connects, which will activate the daemon and attempt
initialization - without claiming the DBus name yet.
If something goes wrong here, the daemon will reply to the client
that activated it with the error, and then also exit with failure.
On success, everything continues as normal, including claiming
the DBus name.
Note that this also logically replaces the code that does explicit
systemctl start rpm-ostreed
invocations.After this patch: