-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flaky multipathd service #803
Comments
Failed in the
|
|
Can someone dig into this? |
I had a look at this today and I think this is a race between the The multipath codebase is a rough C one with concurrency baked on bare pthread synchronization, so I'll just postulate there is a "socket I/O race on exit" hiding somewhere in it. |
PR at dracutdevs/dracut#1606. |
This removes the 'ExecStop=' field from `multipathd.service`. Sometimes CI runs do encounter a failure related to this service in initrd, which seems to be stemming from a socket I/O race between the client and the server on shutdown. It looks like the client (`multipathd shutdown`) can lose the race, hit an I/O error, and cause the whole unit to fail (even if the server managed to shutdown properly already). Notably, the upstream unit does not have such stop command as the daemon can already perform a graceful exit through its signal handler. As such, this commit partially re-aligns the two units, trying to sidestep any of the existing races. Refs: * coreos/fedora-coreos-tracker#803 * https://github.com/opensvc/multipath-tools/blob/0.8.7/multipathd/multipathd.service
This removes the 'ExecStop=' field from `multipathd.service`. Sometimes CI runs do encounter a failure related to this service in initrd, which seems to be stemming from a socket I/O race between the client and the server on shutdown. It looks like the client (`multipathd shutdown`) can lose the race, hit an I/O error, and cause the whole unit to fail (even if the server managed to shutdown properly already). Notably, the upstream unit does not have such stop command as the daemon can already perform a graceful exit through its signal handler. As such, this commit partially re-aligns the two units, trying to sidestep any of the existing races. Refs: * coreos/fedora-coreos-tracker#803 * https://github.com/opensvc/multipath-tools/blob/0.8.7/multipathd/multipathd.service
The PR got merged upstream. As it's going to take some time to land in a new dracut packaged release, coreos/fedora-coreos-config#1233 carries an equivalent workaround through our config overrides. |
The fix for this went into testing stream release |
The fix for this went into stable stream release |
Sometimes, the generic kola check for failed systemd services trips on multipathd:
This one was hit during this week's next release:
coreos/fedora-coreos-streams#296
kola.zip
The text was updated successfully, but these errors were encountered: