Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipathd crash when stopping #1

Closed
hexiaowen opened this issue Jan 25, 2021 · 8 comments
Closed

multipathd crash when stopping #1

hexiaowen opened this issue Jan 25, 2021 · 8 comments

Comments

@hexiaowen
Copy link

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000ffff87d9e81c in __GI_abort () at abort.c:79
#2 0x0000ffff87dd7818 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0xffff87e97888 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x0000ffff87dddf6c in malloc_printerr (
str=str@entry=0xffff87e950d0 "free(): invalid pointer") at malloc.c:5389
#4 0x0000ffff87ddf780 in _int_free (av=0xffff87ed7a58 <main_arena>, p=0xffff80000070,
have_lock=0) at malloc.c:4172
#5 0x0000ffff880f55a8 in internal_hashmap_clear (h=h@entry=0xffff80027980,
default_free_key=, default_free_value=)
at ../src/basic/hashmap.c:902
#6 0x0000ffff880f56a0 in internal_hashmap_free (h=,
default_free_key=, default_free_value=,
default_free_value=, default_free_key=, h=)
at ../src/basic/hashmap.c:874
#7 0x0000ffff880f582c in ordered_hashmap_free_free_free () at ../src/basic/hashmap.h:118
#8 device_free (device=0xffff80027820) at ../src/libsystemd/sd-device/sd-device.c:68
#9 sd_device_unref (p=) at ../src/libsystemd/sd-device/sd-device.c:78
#10 0x0000ffff88100978 in sd_device_unrefp () at ../src/systemd/sd-device.h:118
#11 device_new_from_nulstr (len=, nulstr=0xffff877f93d0 "",
ret=) at ../src/libsystemd/sd-device/device-private.c:448
#12 device_monitor_receive_device (m=0xffff80000b20, ret=ret@entry=0xffff877fb388)
at ../src/libsystemd/sd-device/device-monitor.c:447
#13 0x0000ffff881028a4 in udev_monitor_receive_sd_device (ret=0xffff877fb388,
udev_monitor=0xffff80000c70) at ../src/libudev/libudev-monitor.c:207
#14 udev_monitor_receive_device (udev_monitor=0xffff80000c70,
udev_monitor@entry=0xffff877fb3a0) at ../src/libudev/libudev-monitor.c:253
#15 0x0000ffff881a3478 in uevent_listen (udev=0xffff877fbf40) at uevent.c:853
#16 0x0000aaaadc524514 in ueventloop (ap=0xffffc4134bd0) at main.c:1518
#17 0x0000ffff880827ac in start_thread (arg=0xffff8821e380) at pthread_create.c:486
#18 0x0000ffff87e3c47c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Multipathd has produced almost the same call stack twice.
The udev API is suspected at first. However, hashmap is a common data structure of systemd. Systemd has never had the same call stack,
Can someone help me?

In the test case, run the kill -9 multipathd command repeatedly and then restart the system. Check whether the function is normal.

@hexiaowen
Copy link
Author

There's a strange phenomenon here.
In frame 11, nulstr=0xffff877f93d0 "",
But in frame 12,

x/32bs (uint8_t*) &buf.raw[bufpos]
0xffff877f9360: "ACTION"
0xffff877f9367: "change"
0xffff877f936e: "DEVPATH"
0xffff877f9376: "/devices/virtual/block/dm-69"
0xffff877f9393: "SUBSYSTEM"
0xffff877f939d: "block"
0xffff877f93a3: "DM_COOKIE"
0xffff877f93ad: "23068672"
0xffff877f93b6: "DEVNAME"
0xffff877f93be: "/dev/dm-69"
0xffff877f93c9: "DEVTYPE"
0xffff877f93d1: "disk"
0xffff877f93d6: "SEQNUM"
0xffff877f93dd: "14437"
0xffff877f93e3: "USEC_INITIALIZED"
0xffff877f93f4: "8213096220"
0xffff877f93ff: "MAJOR"
0xffff877f9405: "253"
0xffff877f9409: "MINOR"
0xffff877f940f: "69"
0xffff877f9412: "DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG"
0xffff877f9438: "1"
0xffff877f943a: "DM_UDEV_PRIMARY_SOURCE_FLAG"
0xffff877f9456: "1"
0xffff877f9458: "DM_SUBSYSTEM_UDEV_FLAG0"
0xffff877f9470: "1"
0xffff877f9472: "DM_ACTIVATION"
0xffff877f9480: "0"
0xffff877f9482: "DM_NAME"
0xffff877f948a: "36e02861100592fcc99ad3c3800000195"
0xffff877f94ac: "DM_UUID"
0xffff877f94b4: "mpath-36e02861100592fcc99ad3c3800000195"

@hexiaowen hexiaowen changed the title kill -9 multipathd command repeatedly and then restart the multipathd cause crash multipathd crash when stopping Jan 26, 2021
@mwilck
Copy link
Contributor

mwilck commented Jan 26, 2021

As noted on dm-devel, could you check if it helps to disable pthread_cancel() while calling udev_monitor_receive_device()?

I don't think libudev is generally safe to be used in multithreaded programs. We're not aware of any issues, but this might be one.

@lixiaokeng
Copy link
Contributor

It is helpful to disable pthread_cancel() while calling udev_monitor_receive_device(). Please privide a patch. Thanks.

@mwilck
Copy link
Contributor

mwilck commented Feb 19, 2021

This is a major change in multipath-tools, and can't be rushed. I've been sick lately and not been able to work on it. Please explore if you can't fix the issue in OpenEuler by just not using -fexceptions for libudev and libsystemd.

@lixiaokeng
Copy link
Contributor

This is fixed by not using -fexceptions. Thanks!

@mwilck
Copy link
Contributor

mwilck commented Mar 16, 2021

FTR, there was anothre issue, fixed with openSUSE@38ffd89 from https://github.com/openSUSE/multipath-tools/tree/queue.

@mwilck
Copy link
Contributor

mwilck commented Mar 16, 2021

I believe this issue can be closed.

@mwilck
Copy link
Contributor

mwilck commented Apr 29, 2021

@cvaroqui, would you mind closing this issue?

cvaroqui pushed a commit that referenced this issue Dec 2, 2021
... by the paths and pg vectors of the map to be removed.

Original bug report from Lixiaokeng ("libmultipath: clear removed path from mpp"):

multipathd[3525635]: ==3525635==ERROR: AddressSanitizer: heap-use-after-free on address 0xffffa4902fc0 at pc 0xffffac7d5b88 bp 0xffffa948dac0 sp 0xffffa948dae0
multipathd[3525635]: READ of size 8 at 0xffffa4902fc0 thread T7
multipathd[3525635]:    #0 0xffffac7d5b87 in free_multipath (/usr/lib64/libmultipath.so.0+0x4bb87)
multipathd[3525635]:    #1 0xaaaad6cf7057  (/usr/sbin/multipathd+0x17057)
multipathd[3525635]:    #2 0xaaaad6cf78eb  (/usr/sbin/multipathd+0x178eb)
multipathd[3525635]:    #3 0xaaaad6cff4df  (/usr/sbin/multipathd+0x1f4df)
multipathd[3525635]:    #4 0xaaaad6cfffe7  (/usr/sbin/multipathd+0x1ffe7)
multipathd[3525635]:    #5 0xffffac807be3 in uevent_dispatch (/usr/lib64/libmultipath.so.0+0x7dbe3)
multipathd[3525635]:    #6 0xaaaad6cf563f  (/usr/sbin/multipathd+0x1563f)
multipathd[3525635]:    #7 0xffffac6877af  (/usr/lib64/libpthread.so.0+0x87af)
multipathd[3525635]:    #8 0xffffac44118b  (/usr/lib64/libc.so.6+0xd518b)
multipathd[3525635]: 0xffffa4902fc0 is located 1344 bytes inside of 1440-byte region [0xffffa4902a80,0xffffa4903020)
multipathd[3525635]: freed by thread T7 here:
multipathd[3525635]:    #0 0xffffac97d703 in free (/usr/lib64/libasan.so.4+0xd0703)
multipathd[3525635]:    #1 0xffffac824827 in orphan_paths (/usr/lib64/libmultipath.so.0+0x9a827)
multipathd[3525635]:    #2 0xffffac824a43 in remove_map (/usr/lib64/libmultipath.so.0+0x9aa43)
multipathd[3525635]:    #3 0xaaaad6cf7057  (/usr/sbin/multipathd+0x17057)
multipathd[3525635]:    #4 0xaaaad6cf78eb  (/usr/sbin/multipathd+0x178eb)
multipathd[3525635]:    #5 0xaaaad6cff4df  (/usr/sbin/multipathd+0x1f4df)
multipathd[3525635]:    #6 0xaaaad6cfffe7  (/usr/sbin/multipathd+0x1ffe7)
multipathd[3525635]:    #7 0xffffac807be3 in uevent_dispatch (/usr/lib64/libmultipath.so.0+0x7dbe3)
multipathd[3525635]:    #8 0xaaaad6cf563f  (/usr/sbin/multipathd+0x1563f)
multipathd[3525635]:    #9 0xffffac6877af  (/usr/lib64/libpthread.so.0+0x87af)
multipathd[3525635]:    #10 0xffffac44118b  (/usr/lib64/libc.so.6+0xd518b)

When mpp only has one path and log out the path, there is an asan error.
In remove_mpp, the pp is freed firstly in orphan_path but is accessed,
changed in free_multipath later. Before free_path(pp), the pp should be
cleared from pp->mpp.

Reported-by: Lixiaokeng <[email protected]>
Tested-by: Lixiaokeng <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants