Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb -S ASSERT at ../../module/zfs/dsl_deadlist.c:308:dsl_deadlist_open()Aborted (core dumped) #15423

Open
stuartthebruce opened this issue Oct 19, 2023 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@stuartthebruce
Copy link

System information

Type Version/Name
Distribution Name Rocky Linux
Distribution Version 8.8
Kernel Version 4.18.0-477.21.1.el8_8
Architecture x86_64
OpenZFS Version 2.1.12

Describe the problem you're observing

zdb -S crashes with a core dump

Describe how to reproduce the problem

[root@zfs9 ~]# time zdb -S scratch
dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) == 0 (0x34 == 0)                                             
ASSERT at ../../module/zfs/dsl_deadlist.c:308:dsl_deadlist_open()Aborted (core dumped)                    

real    1m25.791s
user    0m20.698s
sys     1m17.600s

Note, after the debug information was gathered below a second attempt resulted in the same assertion failure,

[root@zfs9 ~]# time zdb -S scratch
dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) == 0 (0x34 == 0)
ASSERT at ../../module/zfs/dsl_deadlist.c:308:dsl_deadlist_open()Aborted (core dumped)

real    2m8.076s
user    0m29.799s
sys     1m52.933s

Include any warning/errors/backtraces from the system logs

[root@zfs9 ~]# coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2023-10-18 16:29:35 PDT  3545979     0     0   6 truncated /usr/sbin/zdb

[root@zfs9 ~]# coredumpctl info
           PID: 3545979 (zdb)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Wed 2023-10-18 16:29:30 PDT (17h ago)
  Command Line: zdb -S scratch
    Executable: /usr/sbin/zdb
 Control Group: /
         Slice: -.slice
       Boot ID: 290edc1d68bd4b86ac50781816df51f8
    Machine ID: 514973d944d7471fbbe61edf10e6f5dd
      Hostname: zfs9
       Storage: /var/lib/systemd/coredump/core.zdb.0.290edc1d68bd4b86ac50781816df51f8.3545979.1697671770000000.lz4 (truncated)
       Message: Process 3545979 (zdb) of user 0 dumped core.
                
                Stack trace of thread 3548538:
                #0  0x00007fc9b1ce4acf n/a (n/a)
                #1  0x00007fc9b3a6f238 n/a (n/a)
                #2  0x00007fc9b3a66382 n/a (n/a)
                #3  0x00007fc9b3a35bd3 n/a (n/a)
                #4  0x00007fc9b3a35fb9 n/a (n/a)
                #5  0x00007fc9b39e46fd n/a (n/a)
                #6  0x00007fc9b20631ca n/a (n/a)

[root@zfs9 ~]# coredumpctl debug
           PID: 3545979 (zdb)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Wed 2023-10-18 16:29:30 PDT (17h ago)
  Command Line: zdb -S scratch
    Executable: /usr/sbin/zdb
 Control Group: /
         Slice: -.slice
       Boot ID: 290edc1d68bd4b86ac50781816df51f8
    Machine ID: 514973d944d7471fbbe61edf10e6f5dd
      Hostname: zfs9
       Storage: /var/lib/systemd/coredump/core.zdb.0.290edc1d68bd4b86ac50781816df51f8.3545979.1697671770000000.lz4 (truncated)
       Message: Process 3545979 (zdb) of user 0 dumped core.
                
                Stack trace of thread 3548538:
                #0  0x00007fc9b1ce4acf n/a (n/a)
                #1  0x00007fc9b3a6f238 n/a (n/a)
                #2  0x00007fc9b3a66382 n/a (n/a)
                #3  0x00007fc9b3a35bd3 n/a (n/a)
                #4  0x00007fc9b3a35fb9 n/a (n/a)
                #5  0x00007fc9b39e46fd n/a (n/a)
                #6  0x00007fc9b20631ca n/a (n/a)

GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/zdb...Reading symbols from /usr/lib/debug/usr/sbin/zdb-2.1.12-1.el8.x86_64.debug...done.
done.
BFD: warning: /var/tmp/coredump-HU8bwJ is truncated: expected core file size >= 2430345216, found: 2147483648
[New LWP 3548538]
[New LWP 3545992]
[New LWP 3548706]

... (total of 1875 New LWP lines)

[New LWP 3547649]
[New LWP 3547653]
[New LWP 3547715]

warning: Error reading shared library list entry at 0x74726174735f636f

warning: Error reading shared library list entry at 0x4900000008
Failed to read a valid object file image from memory.
Core was generated by `zdb -S scratch'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fc9b1ce4acf in ?? ()
[Current thread is 1 (LWP 3548538)]
(gdb) where
#0  0x00007fc9b1ce4acf in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) quit
@stuartthebruce stuartthebruce added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 19, 2023
@stuartthebruce
Copy link
Author

After upgrading to 2.2.0,

[root@zfs9 ~]# cat /sys/module/zfs/version 
2.2.0-1

I now get either zdb: can't open 'scratch': Invalid exchange or an empty table,

[root@zfs9 ~]# time zdb -S scratch
Simulated DDT histogram:

bucket              allocated                       referenced                                  
______   ______________________________   ______________________________                        
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE                        
------   ------   -----   -----   -----   ------   -----   -----   -----                        


real    0m26.009s
user    0m20.604s
sys     0m28.306s
[root@zfs9 ~]# time zdb -S scratch
zdb: can't open 'scratch': Invalid exchange

real    0m2.802s
user    0m3.877s
sys     0m3.901s
[root@zfs9 ~]# time zdb -S scratch
zdb: can't open 'scratch': Invalid exchange

real    0m3.488s
user    0m3.836s
sys     0m4.214s

@stuckj
Copy link

stuckj commented Sep 18, 2024

Seeing the same assertion running zdb -Lbbbs DATASET.

This is on proxmox 8.2.5: Linux proxmox 6.8.12-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-1 (2024-08-05T16:17Z) x86_64 GNU/Linux

ZFS version: zfs-2.2.6-pve1

It worked on a second run so the issue appeared transient. The cluster was not under heavy load at the time and a scrub had just been run a few days before with no errors. Pool is a mirror of 6 mirrors + a special vdev containing 2 mirrors. I was trying to see the block histogram to tweak the special_small_blocks parameter for the special vdev.

root@proxmox:~# zdb -Lbbbs root-dataset

Traversing all blocks ...

31.0T completed (4608MB/s) estimated time remaining: 0hr 18min 56sec        ASSERT at module/zfs/dsl_deadlist.c:308:dsl_deadlist_open()
dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) == 0 (0x34 == 0)
  PID: 803385    COMM: zdb
  TID: 803385    NAME: zdb
Call trace:
  /lib/x86_64-linux-gnu/libzpool.so.5(libspl_assertf+0x157) [0x79f70e95b627]
  /lib/x86_64-linux-gnu/libzpool.so.5(dsl_deadlist_open+0x118) [0x79f70e770b58]
  /lib/x86_64-linux-gnu/libzpool.so.5(dsl_dataset_hold_obj+0x514) [0x79f70e767d84]
  /lib/x86_64-linux-gnu/libzpool.so.5(dsl_dataset_hold_obj+0xc04) [0x79f70e768474]
  /lib/x86_64-linux-gnu/libzpool.so.5(traverse_pool+0x15c) [0x79f70e74765c]  
  zdb(+0x1a388) [0x630056a0e388]
  zdb(+0x213ad) [0x630056a153ad]
  zdb(+0xafd1) [0x6300569fefd1]
  /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x79f70e00524a]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x79f70e005305]
  zdb(+0xc7f1) [0x630056a007f1]
zdb(+0x13db3)[0x630056a07db3]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x79f70e01a050]
/lib/x86_64-linux-gnu/libc.so.6(+0x8ae3c)[0x79f70e068e3c]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x12)[0x79f70e019fb2]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x79f70e004472]
/lib/x86_64-linux-gnu/libzpool.so.5(+0x57a97)[0x79f70e6b6a97]
/lib/x86_64-linux-gnu/libzpool.so.5(dsl_deadlist_open+0x118)[0x79f70e770b58]
/lib/x86_64-linux-gnu/libzpool.so.5(dsl_dataset_hold_obj+0x514)[0x79f70e767d84]
/lib/x86_64-linux-gnu/libzpool.so.5(dsl_dataset_hold_obj+0xc04)[0x79f70e768474]
/lib/x86_64-linux-gnu/libzpool.so.5(traverse_pool+0x15c)[0x79f70e74765c]
zdb(+0x1a388)[0x630056a0e388]
zdb(+0x213ad)[0x630056a153ad]
zdb(+0xafd1)[0x6300569fefd1]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x79f70e00524a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x79f70e005305]
zdb(+0xc7f1)[0x630056a007f1]
Aborted
root@proxmox:~#

@stuartthebruce
Copy link
Author

FYI, the original system that I reported this problem on is now working while running RL8.10 and ZFS 2.2.5,

[root@zfs9 ~]# uname -a
Linux zfs9 4.18.0-553.16.1.el8_10.x86_64 #1 SMP Thu Aug 8 17:47:08 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

[root@zfs9 ~]# cat /sys/module/zfs/version 
2.2.5-1

[root@zfs9 ~]# time zdb -S scratch
Simulated DDT histogram:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    34.4M   3.67T   3.05T   3.14T    34.4M   3.67T   3.05T   3.14T
     2    4.84M    566G    484G    495G    10.8M   1.24T   1.07T   1.09T
     4     733K   89.9G   84.9G   85.5G    3.25M    408G    386G    389G
     8    38.2K   4.57G   3.46G   3.59G     412K   49.3G   37.6G   39.1G
    16    95.4K   9.34G   7.53G   7.95G    1.70M    175G    141G    149G
    32    4.97K    476M    352M    376M     250K   23.5G   17.5G   18.7G
    64    1.88K    146M   44.3M   61.4M     182K   13.6G   4.04G   5.68G
   128      897    103M   47.6M   52.4M     160K   18.3G   8.29G   9.15G
   256      658   76.1M   26.4M   30.4M     235K   27.2G   9.48G   10.9G
   512      537   61.9M   19.9M   23.0M     376K   43.5G   14.0G   16.2G
    1K      307   35.0M   11.6M   13.2M     426K   48.2G   16.0G   18.2G
    2K      117   12.5M   4.15M   4.77M     311K   32.7G   11.0G   12.6G
    4K      105   11.7M    850K   1.70M     447K   48.4G   4.10G   7.79G
    8K       45   4.56M   4.44M   4.52M     416K   40.8G   39.6G   40.4G
   16K       39   4.49M   4.32M   4.36M    1.12M    134G    129G    130G
   32K        1    128K     18K   27.4K    32.7K   4.09G    589M    897M
  256K        3    384K    292K    302K     827K    103G   78.4G   81.2G
 Total    40.0M   4.32T   3.62T   3.72T    55.2M   6.05T   5.00T   5.14T

dedup = 1.38, compress = 1.21, copies = 1.03, dedup * compress / copies = 1.63


real	55m55.754s
user	35m30.085s
sys	23m50.757s

so for my part this issue could be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants