-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NULL pointer dereference in __mutex_unlock_slowpath #2939
Milestone
Comments
Thanks for filing this. This is a duplicate of #2523 which is being worked. |
behlendorf
pushed a commit
to openzfs/spl
that referenced
this issue
Dec 19, 2014
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421
This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article. openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races |
dajhorn
added a commit
to zfsonlinux/pkg-spl
that referenced
this issue
Dec 20, 2014
Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <[email protected]> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Backported-by: Darik Horn <[email protected]> Closes #421 Conflicts: include/sys/mutex.h
behlendorf
pushed a commit
to openzfs/spl
that referenced
this issue
Dec 23, 2014
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #421
ryao
pushed a commit
to ryao/spl
that referenced
this issue
Feb 19, 2015
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#421 Conflicts: include/sys/mutex.h
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
[617214.295684] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[617214.295814] IP: [] __mutex_unlock_slowpath+0x25/0x40
[617214.295920] PGD 12dc65067 PUD 26afe7067 PMD 0
[617214.295998] Oops: 0000 [#1] SMP
[617214.296053] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss oid_registry nfs_acl nfs lockd sunrpc fscache binfmt_misc ext4
crc16 mbcache jbd2 iTCO_wdt iTCO_vendor_support intel_powerclamp radeon coretemp kvm_intel ttm drm_kms_helper kvm evdev drm psmouse serio_raw pcspkr
i2c_algo_bit hpilo i2c_core hpwdt lpc_ich i7core_edac mfd_core edac_core ipmi_si ipmi_msghandler acpi_power_meter button shpchp processor autofs4 zfs(PO)
zunicode(PO) zcommon(PO) znvpair(PO) zavl(PO) spl(O) sha256_ssse3 sha256_generic algif_skcipher af_alg dm_crypt dm_mod raid1 raid0 md_mod sd_mod crc_t10dif
ses enclosure crct10dif_generic sg sr_mod cdrom ata_generic hid_generic usbhid hid crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel
ghash_clmulni_intel ata_piix aesni_intel aes_x86_64 lrw
[617214.297440] gf128mul bfa glue_helper ablk_helper mpt2sas cryptd libata uhci_hcd ehci_pci scsi_transport_fc raid_class scsi_tgt ehci_hcd
scsi_transport_sas usbcore bnx2 thermal usb_common scsi_mod bna thermal_sys
[617214.297820] CPU: 5 PID: 4463 Comm: z_rd_int/0 Tainted: P IO 3.16.0-4-amd64 #1 Debian 3.16.7-2
[617214.297954] Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
[617214.298042] task: ffff880c02aa2c20 ti: ffff880bbfca4000 task.ti: ffff880bbfca4000
[617214.298149] RIP: 0010:[] [] __mutex_unlock_slowpath+0x25/0x40
[617214.298281] RSP: 0018:ffff880bbfca7c88 EFLAGS: 00010217
[617214.298356] RAX: 0000000000000000 RBX: ffff88085a0d3040 RCX: 0000000000000000
[617214.298458] RDX: ffff88085a0d3048 RSI: 0000000000000246 RDI: ffff88085a0d3044
[617214.298576] RBP: 0000000000000000 R08: ffffffff8160dd48 R09: 0000000000000001
[617214.298704] R10: 0000000000014240 R11: 0000000000000010 R12: 0000000000000000
[617214.298806] R13: 0000000000200000 R14: ffff88085a0d3040 R15: 0000000000000000
[617214.298908] FS: 0000000000000000(0000) GS:ffff880c1fa40000(0000) knlGS:0000000000000000
[617214.299022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[617214.299105] CR2: 0000000000000010 CR3: 00000004910b1000 CR4: 00000000000007e0
[617214.299207] Stack:
[617214.299237] ffff88085a0d2d30 ffffffffa0533cea ffffffffa037f607 0000000000000000
[617214.299361] ffff88085a0d3040 ffff88085a0d2d30 ffff8808d27a19d8 0000000000000000
[617214.299483] 0000000000000000 0000000000200000 ffff880c02aa2c20 ffff88085a0d2d30
[617214.299605] Call Trace:
[617214.299682] [] ? zio_done+0x61a/0xcf0 [zfs]
[617214.299773] [] ? spl_kmem_cache_free+0x137/0x3b0 [spl]
[617214.299877] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.299974] [] ? spa_config_exit+0x69/0x90 [zfs]
[617214.300069] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.300161] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.300252] [] ? zio_wait_for_children+0x4e/0x60 [zfs]
[617214.300357] [] ? zio_execute+0xa7/0x140 [zfs]
[617214.300445] [] ? taskq_thread+0x224/0x490 [spl]
[617214.300534] [] ? wake_up_state+0x10/0x10
[617214.300614] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[617214.300708] [] ? kthread+0xbd/0xe0
[617214.300812] [] ? kthread_create_on_node+0x180/0x180
[617214.300908] [] ? ret_from_fork+0x7c/0xb0
[617214.300987] [] ? kthread_create_on_node+0x180/0x180
[617214.305465] Code: 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb c7 07 01 00 00 00 48 8d 7f 04 e8 e8 15 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b
78 10 e8 e2 98 b8 ff 66 83 43 04 01 5b c3 66 66 2e 0f 1f
[617214.315137] RIP [] __mutex_unlock_slowpath+0x25/0x40
[617214.319811] RSP
[617214.324396] CR2: 0000000000000010
[617214.488943] ---[ end trace 63cdc141ec3acdb5 ]---
Version info:
[ 43.290429] ZFS: Loaded module v0.6.3-21
7b2d78wheezy, ZFS pool version 5000, ZFS filesystem version 5Using the debian daily build via apt-get. Refreshed about 4 days ago. Two zpools: root is on 3-way ZFS mirror composed of dm-crypt vdevs, mass data is on 24-way raidz3 composed of dm-crypt vdevs, which are in turn composed of single-drive md-raid0 arrays. Most IO activity was on the large zpool, so that's probably where this bug was triggered.
The text was updated successfully, but these errors were encountered: