Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipath-tools 0.9.8 #81

Merged
merged 69 commits into from
Feb 23, 2024
Merged

multipath-tools 0.9.8 #81

merged 69 commits into from
Feb 23, 2024

Conversation

mwilck
Copy link
Contributor

@mwilck mwilck commented Feb 10, 2024

Hi Christophe,

here's a PR for 0.9.8. As usual, all patches have been reviewed, except the ones marked with (*) below, which are some GitHub action changes and 2 minor documentation fixes.

(Side note: we should start thinking about what the next version after 0.9.9 should be).

News

  • Socket activation via multipathd.socket has been disabled by default because it has undesirable side effects (fixes Socket activation doesn't play well with udev rule #76, at least partially).
  • The restorequeueing CLI command now only enables queueing if disablequeueing had been sent before.
  • multipathd now tracks the queueing mode of maps in its internal features string. This is helpful to ensure that maps have the desired queuing status. Without this, it can happen that a map remains in queueing state even after the no_path_retry timeout has expired.
  • multipathd's map flushing code has been reworked to avoid hangs if there are no paths but outstanding IO. This allows "multipath -F" to retry a flush using the daemon rather than retrying locally.
  • Error messages sent from multipathd to the command line client have been improved. The user will now see messages like "map or partition in use" or "device not found" instead of just "fail".
  • Enabled -D_FILE_OFFSET_BITS=64 to fix issues with emulated 32-bit environments in the GitHub CI, so that we can now run our CI in arm/v7.

Bug fixes

  • A segfault in the 0.9.7 autoresize code has been fixed.
  • Fixed a bug introduced in 0.9.6 that had caused map reloads being ommitted when path priorities changed.
  • Fixed compilation with gcc 14. (Fixes 0.9.7: test suite build fails with gcc 14.x #80)
  • Minor fixes for issues detected by coverity.
  • Spelling fixes and other minor fixes.

CI

  • Added the check-spelling GitHub action to avoid spelling errors in user-facing files in the future.
  • Various improvements and updates for GitHub actions

Patch breakdown

@bmarzins (26):
multipathd: Make sure to disable queueing if recovery has failed.
multipathd: don't modify the multipath device on show commands
libmultipath: keep track of queueing state in features
multipathd: only restore queueing if it has been disabled first
multipathd: remove nopath flushing code from flush_map()
multipathd: make flush_map() delete maps like the multipath command
multipathd: disable queueing when removing unknown maps
multipathd: simplify cli_del_map()
libmultipath: make _dm_flush_map() return an enum
libmultipath: make dm_remove_partmaps() a static function
multipathd: always start failure replies with "fail\n"
multipathd: print extra default reply msg for busy devices
multipathd: handle busy devices in cli_del_maps()
libmultipath: print error when find_mp_by_str() fails.
multipathd: don't open code find_mp_by_str() in client handers
multipathd: make cli_handlers check for paths by devt and dev
multipathd: add cli_handler reply message for missing devs
libdmmp: handle failures in _process_cmd
multipath: get rid of unnecessary retries variable
multipath: Don't locally retry deletgated remove failures
multipath: if delegation times out mark as not delegated
multipathd: sync features on flush_map failure corner case
multipathd: fix null pointer dereference in uev_update_path
multipathd: fix auto-resize configuration
libmultipath: fix displaying auto_resize config setting
multipath-tools tests: add void parameter to functions

@brianatpurestorage (2):
multipathd: the local path change is not considered
libmultipath: stop PURE FlashArray from detecting priority

@jsoref (9):
spelling: anymore
spelling: cannot
spelling: case-insensitive
spelling: case-sensitive
spelling: configuration
spelling: correctly
spelling: numerically
spelling: preexisting
spelling: than

@mwilck (29):
libmultipath: avoid temporarily enabling queueing on reload
libmultipath: fix ANA prioritizer enablement logic
multipathd: init_unwinder: protect pthread_cond_wait() call with a loop
libmultipath: is_uevent_busy(): check servicing_uev under lock
libmultipath: tur: protect pthread_cond_timedwait with a loop
multipathd: make update_prio static, and rename refresh_all param
multipathd: don't activate socket activation by default
multipath: udev rules: use configured $(bindir) in udev rules
multipath-tools: Makefile.inc: set _FILE_OFFSET_BITS=64
GitHub workflows: update workflow events ()
GitHub workflows: update distros to run for (
)
GitHub workflows: split multiarch workflow in rolling/stable ()
GitHub Workflows: run expensive workflows only on relevant changes (
)
GitHub workflows: add spell checker ()
Spelling fixes found by check-spelling action (
)
GitHub Workflows: abi: only run if C code has changed ()
GitHub workflows: abi: error if reference ABI can't be downloaded (
)
GitHub workflows: run on PRs against "queue", too ()
multipath.conf.5: fix documentation for find_multipaths (fixes #75) (
)
libmultipath: hwtable: fix fast_io_fail for Infinibox
11-dm-mpath.rules: fix list of imported properties
11-dm-mpath.rules: use import logic like 13-dm-disk.rules
11-dm-mpath.rules: don't import DM_UDEV_DISABLE_OTHER_RULES_FLAG
11-dm-mpath.rules: handle reloads during coldplug events
11-dm-mpath.rules: don't save DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD
11-dm-mpath.rules: clear DM_DISABLE_OTHER_RULES_FLAG for coldplug events
libmultipath: bump version to 0.9.8 ()
multipath-tools: added NEWS.md (
)

@xosevp (3):
multipath-tools: update ml
multipath-tools: fix an assignment ambiguity
multipath-tools: remove extra hyphen from CFLAGS std option

xosevp and others added 30 commits December 18, 2023 15:22
Cc: Martin Wilck <[email protected]>
Cc: Benjamin Marzinski <[email protected]>
Cc: Christophe Varoqui <[email protected]>
Cc: DM-DEVEL ML <[email protected]>
Signed-off-by: Xose Vazquez Perez <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Instead of always enabling queueing when a map is reloaded with
no_path_retry set to a positive number, check if the map has timed out
in recovery mode, and only enable queueing if it has not. This saves
multipathd from having to disable queueing on the map immediately after
the reload.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
If a multipath device has no_path_retry set to a number and has lost all
paths, gone into recovery mode, and timed out, it will disable
queue_if_no_paths. After that, if the device is reloaded by multipath
outside of multipathd, it will re-enable queuieng on the device. When
multipathd later calls set_no_path_retry() to update the queueing state,
it will not disable queue_if_no_paths, since the device is still in the
recovery state, so it believes no work needs to be done. The device will
remain in the recovery state, with retry_ticks at 0, and queueing
enabled, even though there are no usable paths.

To fix this, in set_no_path_retry(), if no_path_retry is set to a number
and the device is queueing but it is in recovery mode and out of
retries with no usable paths, manually disable queue_if_no_path.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
The commands to show a multipath device are supposed to return its
current state without updating it. Even when reset is 0,
update_multipath() still can update the device.

To fix this, split __setup_multipath() into two functions:
refresh_multipath(), that updates the table and status, and
setup_multipath(), which works as before but now calls
refresh_multipath(). Make the multipathd show commands call
refresh_multipath() instead of update_multipath().

With the show commands calling refresh_multipath(), all callers of
update_multipath() set the reset argument, so remove it and always call
setup_multipath() from update_multipath().

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Make multipathd update mpp->features when in enables or disables
queuing. This patch handles all the cases except failed removes by
dm_suspend_and_flush_map(), which is never called by multipathd.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Make the restorequeueing command only do something if disablequeueing
has first been run on the map. Also update the man page to explain
what restorequeuing actually does.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Fixes: 3e71d8a ("multipath-tools Makefiles: create config.mk")

Suggested-by: Lidong Zhong <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Instead of flush_map() handling both user requested flushes and
automatic flushes when the last path has been deleted, make
flush_map_nopaths() handle the automatic flushes itself, since a later
patch will change the behavior of flush_map().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
When the multipath command tries to delete a multipath device, it first
disables queueing and then suspends the device to force the IOs to get
flushed. Then it attempts to delete the device and any kpartx
partitions.  multipathd, on the other hand, simply tries to delete the
device and kpartx partitions, without disabling queueing or suspending.
If there are no paths but there is outstanding IO, multipathd will hang
trying to delete the last kpartx device. This is because it must be the
last opener of the multipath device (multipath won't try to delete the
device if it has any openers besides the kpartx devices) and the kernel
will not allow the last opener of a block device to close until all the
outstanding IO is flushed.  This hang can be avoided if multipathd calls
dm_suspend_and_flush_map() like the multipath command does, instead of
dm_flush_map().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Make cli_del_maps() call dm_suspend_and_flush_map() for the unknown
multipath devices as well.

After this change, all callers of cli_del_maps() set need_suspend, so
simplify dm_flush_maps().

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
cli_del_map() does a lot of unnecessary work to match the arguments of
ev_remove_map(), of which it is the only caller.  Then ev_remove_map()
does more unnecessary work to verify the arguments passed in. remove
ev_remove_map() and make cli_del_map() get the mpp like the rest of
the client handlers do.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
A future patch will add an additional return code, so make this an enum
instead of just using numbers.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
dm_remove_partmaps() is only used in devmapper.c, so make it static. It
does need to be declared early, since remove_partmaps() and it call
eachother.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
When multipathd interactive commands fail for certain reasons, like
timing out or incorrect permissions, they do not reply with "fail\n".
Currently, multipath and multipathc expect that a reply other than
"fail\n" means success. Instead, prefix all failure replies with
"fail\n", and adapt multipath and multipathc to return failure for any
reply starting with "fail\n".

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
If removing a multipath device fails because the device is in use,
return DM_FLUSH_BUSY from remove_functions, which causes cli_del_map()
to return -EBUSY, which will now print extra information in
default_reply().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Make cli_del_maps() return -EBUSY like cli_del_map() if it fails because
a device is in use and it doesn't run into any other type of failures.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
None of the callers of find_mp_by_str() print any message if they fail
because the map name is invalid. Print one in find_mp_by_str() to save
the effort of adding it to all the callers.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
multiple client handlers simply open coded find_mp_by_str(). Just
use the function instead.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Some of the client handlers checked for paths by both dev and devt, but
not all. Also, many of the client handlers don't print anything if they
failed to find a path. Make all the client handlers which work on path
devices use a new function, find_path_by_str(), which will try both
methods to find a path, and can print out an error message if none is
found.

(mwilck: find_path_by_str() tries to match device names by devt (major:minor)
first, then by device name, whereas the open-coded lookup checked for
the device name first. This is for consistency with other device lookup
algorithms. It doesn't make a difference in practice, because no real devices
have names in the major:minor format.)

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
When the cli_handlers cannot find the requested map or path, they will
now return -ENODEV, which prints extra information in default_reply().

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
currently, most of the callers of _process_cmd() do not gracefully
handle the case where multipathd returns "fail\n". dmmp_flush_mpath()
does, but it does extra work to try to figure out after the fact why the
flush command failed. Instead, handle fail replies in _process_cmd()
using the appropriate DMMP error codes.

Cc: Gris Ge <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Run the expensive multi-arch workflows only if actual source code
has changed.

Signed-off-by: Martin Wilck <[email protected]>
Only check end-user visible files:
 - README
 - man pages
 - public header files
 - udev rules and systemd unit files

Also don't check file names, and make sure that the spelling workflow
is only rung pushes that alter any of the files above.

Add custom patterns (patterns.txt) and dictionary words (expect.txt) to
make the spell check pass.

See https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples

Signed-off-by: Martin Wilck <[email protected]>
The reference ABI (being a workflow artifact) can expire, causing
the download to fail. In this case, the abi workflow should not report
success. Also, add the "workflow_dispatch" event for the abi workflow,
so that we can run it manually if a previous run failed.

Signed-off-by: Martin Wilck <[email protected]>
Document "on" and "off", and mention that "yes"/"1" and "no"/"0" are
still accepted as alias values.

Suggested-by: Paul Donohue <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor Author

mwilck commented Feb 12, 2024

Pushed a minor change to the _FILE_OFFSET_BITS=64 patch: use $(OPTFLAGS) in the cpp command (although it's rather pointless) to avoid warning about missing optimization flags when building the unit tests.

The Infinibox hwtable entry had fast_io_fail and dev_loss both set to 15,
which has the effect that fast_io_fail is switched off (dev_loss timeout
must be higher than fast_io_fail timeout).

Remove the dev_loss value, which will cause the default of 600 to be used.
Experience shows that dev_loss_tmo values below a few minutes are seldom
helpful.

Fixes: 55da608 ("libmultipath: update INFINIDAT builtin config")
Signed-off-by: Martin Wilck <[email protected]>
Cc: Arnon Yaari <[email protected]>
Cc: Xose Vazquez Perez <[email protected]>
Cc: Tom Abraham <[email protected]>
Cc: Jiri Belka <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
@mwilck
Copy link
Contributor Author

mwilck commented Feb 14, 2024

Pushed another patch after review from @bmarzins.

Once [PATCH v3] 11-dm-mpath.rules: handle reloads during coldplug events gets reviewed, I can add the udev rule series, too.

@mwilck
Copy link
Contributor Author

mwilck commented Feb 14, 2024

The CI run for "multiarch test for rolling distros / build-current (debian-sid, arm/v7)" has apparently failed because of an issue with downloading the test container. @cvaroqui please restart the workflow if you have time.

Make sure we import all properties that are also imported in
13-dm-disk.rules. Keep importing ID_FS_TYPE for now to avoid
breakage, even if 13-dm-disk.rules does not.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
We have to import the properties if either DM_NOSCAN or
DM_DISABLE_OTHER_RULES_FLAG is set, because blkid will be skipped
in both cases. Also, if DM_UDEV_PRIMARY_SOURCE_FLAG is not set,
it makes no sense to try and import the properties.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
DM_UDEV_DISABLE_OTHER_RULES_FLAG is handled by 10-dm.rules, which imports
it from db if necessary. There is no need to do this again here.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Copy link
Contributor

@bmarzins bmarzins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

If a map reload happens while udev is processing rules for a coldplug
event, DM_SUSPENDED may be set if the respective test in 10-dm.rules
happens while the device is suspened. This will cause the rules for all
higher block device layers to be skipped. Record this situation in an udev
property. The reload operation will trigger another "change" uevent later,
which would normally be treated as a reload, and be ignored without
rescanning the device. If a previous "coldplug while suspended" situation is
detected, perform a full device rescan instead.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
The current rules overwrite DM_UDEV_DISABLE_OTHER_RULES_FLAG and save its
value in DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD if MPATH_DEVICE_READY changes
from 1 to 0, and restore DM_UDEV_DISABLE_OTHER_RULES_FLAG from
DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD when MPATH_DEVICE_READY changes back from
0 to 1.

This is wrong for multiple reasons. For "spurious" events, 10-dm.rules will
read DM_UDEV_DISABLE_OTHER_RULES_FLAG from the udev db and obtain the value
saved by 11-dm-mpath.rules rather than it's own saved value, which confuses
the logic in 10-dm.rules. 10-dm.rules sets the flag if either DISK_RO==1 or
DM_SUSPENDED==1, and never clears it (it can only be cleared by a new genuine
libdm event that doesn't have DM_UDEV_DISABLE_OTHER_RULES_FLAG set, while the
device is not suspended). lvm commands may set the flag, too, but AFAICS this
is only done for certain types of logical volumes, not for multipath maps.

If a previously suspended device is resumed, DM_UDEV_DISABLE_OTHER_RULES_FLAG
would be cleared, and it would be wrong for 11-dm-mpath.rules to overwrite it
with a previuosly saved value. Likewise, if a uevent arrives for a suspended
map, and MPATH_DEVICE_READY happens to switch from 0 to 1, it would be wrong
to clear DM_UDEV_DISABLE_OTHER_RULES_FLAG by setting it to a previously saved
value.

We need to set DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 to prevent follow-up rules
from attempting I/O on the device. But don't try to save and restore
DM_UDEV_DISABLE_OTHER_RULES_FLAG between uevents. Rather, reset
DM_UDEV_DISABLE_OTHER_RULES_FLAG to the value we got from 10-dm.rules in a
late rule. I chose "99-z-" for this rule's prefix to make sure it runs
after 99-systemd.rules.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
For all "spurious" events, which includes coldplug events,
DM_DISABLE_OTHER_RULES_FLAG will be read from the udev DB in
10-dm.rules. Thus if a previous event saw the device in suspended
state, the flag will be set even if the device has meanwhile
resumed. Reset the flag if none of the conditions hold that
would cause it to be set in a genuine uevent in 10-dm.rules.

It would be cleaner to do this in 10-dm.rules directly, but it
cannot be done easily, because the flag can also have an origin
inside lvm itself; lvm sets it for various kinds of logical
volumes. For generic (non-LVM) dm devices, the flag is only set
in 10-dm.rules though, so doing this is safe for multipath.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
If paths become available while the device is suspended, don't
activate. Another uevent will arrive when the device is resumed.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
@mwilck
Copy link
Contributor Author

mwilck commented Feb 15, 2024

Pushed the the patch series for 11-dm-mpath.rules now, as the series has been completely reviewed.

@mwilck mwilck marked this pull request as ready for review February 15, 2024 08:39
@mwilck
Copy link
Contributor Author

mwilck commented Feb 15, 2024

Pushed version bump to 0.9.8

This file sums up the content of the GitHub PRs since 0.9.0, giving
a high-level overview of the changes in the past releases. I thought it
might be useful for users and/or distribution maintainers.

Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor Author

mwilck commented Feb 23, 2024

I added a the file NEWS.md that gives an overview about the changes in our code over the last 2 years.

@mwilck
Copy link
Contributor Author

mwilck commented Feb 23, 2024

@cvaroqui, I believe this should be good for merging now.

@cvaroqui cvaroqui merged commit 3daacfd into opensvc:master Feb 23, 2024
53 of 60 checks passed
@mwilck
Copy link
Contributor Author

mwilck commented Feb 23, 2024

Again, the workflow failures seem to have been caused by problems pulling the test containers.

@mwilck
Copy link
Contributor Author

mwilck commented Feb 23, 2024

@cvaroqui, please also push a 0.9.8 tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.9.7: test suite build fails with gcc 14.x Socket activation doesn't play well with udev rule
6 participants