Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs replace start failed with segmentation fault , and seems to hang my system #363

Open
icebluey opened this issue Apr 19, 2021 · 4 comments
Labels
bug kernel something in kernel has to be done too

Comments

@icebluey
Copy link

btrfs-progs: v5.11.1
linux kernel: 5.10.31

After segmentation fault, 'systemctl --force --force reboot' can't even reboot system.

Here's what I did:

# umount /mnt ; wipefs -a -f /dev/sd[b-g]
umount: /mnt: not mounted
/dev/sdc: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
# mkfs.btrfs --checksum xxhash -d raid1 -m raid1 /dev/sd[bc]
btrfs-progs v5.11.1 
See https://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               1bba71d2-b89a-448f-9189-6354be3e06cc
Node size:          16384
Sector size:        4096
Filesystem size:    4.00GiB
Block group profiles:
  Data:             RAID1           204.75MiB
  Metadata:         RAID1           256.00MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           xxhash64
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1     2.00GiB  /dev/sdb
    2     2.00GiB  /dev/sdc

# mount -v UUID=1bba71d2-b89a-448f-9189-6354be3e06cc /mnt
mount: /dev/sdc mounted on /mnt.

# dd status=progress if=/dev/urandom of=/mnt/delete.dd1 bs=1M count=257 iflag=fullblock oflag=dsync
257+0 records in
257+0 records out
269484032 bytes (269 MB) copied, 9.0711 s, 29.7 MB/s

# dd status=progress if=/dev/urandom of=/mnt/delete.dd2 bs=1M count=257 iflag=fullblock oflag=dsync
257+0 records in
257+0 records out
269484032 bytes (269 MB) copied, 10.3438 s, 26.1 MB/s

# dd status=progress if=/dev/urandom of=/mnt/delete.dd3 bs=1M count=257 iflag=fullblock oflag=dsync
257+0 records in
257+0 records out
269484032 bytes (269 MB) copied, 7.91096 s, 34.1 MB/s

# sleep 60 ; dd status=progress if=/dev/urandom of=/mnt/test.dd bs=1M iflag=fullblock oflag=dsync
dd: error writing ‘/mnt/test.dd’: No space left on device
1012+0 records in
1011+0 records out
1060765696 bytes (1.1 GB) copied, 30.3296 s, 35.0 MB/s

# sleep 60 ; dd status=progress if=/dev/urandom of=/mnt/test.dd1 bs=1K iflag=fullblock oflag=dsync
dd: error writing ‘/mnt/test.dd1’: No space left on device
190+0 records in
189+0 records out
193536 bytes (194 kB) copied, 0.428447 s, 452 kB/s

# umount /mnt

# wipefs -a -f /dev/sdb
/dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d

# mount -v -o degraded UUID=1bba71d2-b89a-448f-9189-6354be3e06cc /mnt 
mount: /dev/sdc mounted on /mnt.
# rm -fr /mnt/delete.dd1
# rm -fr /mnt/delete.dd2

# btrfs replace start -B 1 /dev/sdf /mnt
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": No space left on device

# rm -fr /mnt/delete.dd3 

# gdb -q --args btrfs replace start -B 1 /dev/sdf /mnt
Reading symbols from /usr/sbin/btrfs...(no debugging symbols found)...done.
(gdb) run
Starting program: /sbin/btrfs replace start -B 1 /dev/sdf /mnt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": No space left on device
[Inferior 1 (process 1622) exited with code 01]
(gdb) run
Starting program: /sbin/btrfs replace start -B 1 /dev/sdf /mnt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
[Inferior 1 (process 1633) exited with code 01]
(gdb) run
Starting program: /sbin/btrfs replace start -B 1 /dev/sdf /mnt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
[Inferior 1 (process 1638) exited with code 01]
(gdb) run
Starting program: /sbin/btrfs replace start -B 1 /dev/sdf /mnt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system
[Inferior 1 (process 1643) exited with code 01]
(gdb) quit

# umount -v /mnt
umount: /mnt (/dev/sdc) unmounted

# mount -v -o degraded UUID=1bba71d2-b89a-448f-9189-6354be3e06cc /mnt
mount: /dev/sdc mounted on /mnt.

# gdb -q --args btrfs replace start -B 1 /dev/sdf /mnt
Reading symbols from /usr/sbin/btrfs...(no debugging symbols found)...done.
(gdb) run
Starting program: /sbin/btrfs replace start -B 1 /dev/sdf /mnt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) backtrace
No stack.
(gdb) quit

In /var/log/messages:


kernel: assertion failed: 0, in fs/btrfs/dev-replace.c:499
kernel: ------------[ cut here ]------------
kernel: kernel BUG at fs/btrfs/ctree.h:3291!
kernel: invalid opcode: 0000 [#1] SMP PTI
kernel: CPU: 0 PID: 1679 Comm: btrfs Tainted: G        W         5.10.31-1.el7.x86_64 #1
kernel: RIP: 0010:assertfail.constprop.0+0x18/0x1a
kernel: Code: 00 00 00 74 05 e8 3a 97 04 00 48 83 c4 30 5d 41 5c c3 89 f1 48 c7 c2 14 96 c4 ad 48 89 fe 48 c7 c7 28 fe c3 ad e8 51 69 fe ff <0f> 0b 66 66 66 66 90 3e 80 a7 40 0a 00 00 f7 31 c9 48 81 c7 d8 0b
kernel: RSP: 0018:ffffa45441d33d70 EFLAGS: 00010246
kernel: RAX: 0000000000000032 RBX: ffff994444fc0000 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: ffff99447ce18a80 RDI: ffff99447ce18a80
kernel: RBP: ffff994444c64000 R08: ffff99447ffc7da8 R09: 0000000000027ffb
kernel: R10: 00000000ffff8000 R11: 3fffffffffffffff R12: ffff994446506e00
kernel: R13: ffff99444141d140 R14: ffff994444c64b10 R15: ffff994445962a00
kernel: FS:  00007ffff7ee78c0(0000) GS:ffff99447ce00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f49d24cfe58 CR3: 0000000035430000 CR4: 00000000000006f0
kernel: Call Trace:
kernel: btrfs_dev_replace_by_ioctl.cold+0x5e/0x276
kernel: btrfs_ioctl+0x2867/0x3000
kernel: ? do_sigaction+0x1c6/0x240
kernel: ? __x64_sys_ioctl+0x83/0xb0
kernel: __x64_sys_ioctl+0x83/0xb0
kernel: do_syscall_64+0x33/0x80
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7ffff7280307
kernel: Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007fffffffc938 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 00007fffffffe5fd RCX: 00007ffff7280307
kernel: RDX: 00007fffffffcd70 RSI: 00000000ca289435 RDI: 0000000000000007
kernel: RBP: 0000000000000008 R08: 0000000000000000 R09: 00007fffffffc890
kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 000055555564e050
kernel: R13: 000055555564e050 R14: 0000000000000001 R15: 0000000000000007
kernel: ---[ end trace e150833b1864e129 ]---
kernel: RIP: 0010:assertfail.constprop.0+0x18/0x1a
kernel: Code: 00 00 00 74 05 e8 3a 97 04 00 48 83 c4 30 5d 41 5c c3 89 f1 48 c7 c2 14 96 c4 ad 48 89 fe 48 c7 c7 28 fe c3 ad e8 51 69 fe ff <0f> 0b 66 66 66 66 90 3e 80 a7 40 0a 00 00 f7 31 c9 48 81 c7 d8 0b
kernel: RSP: 0018:ffffa45441d33d70 EFLAGS: 00010246
kernel: RAX: 0000000000000032 RBX: ffff994444fc0000 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: ffff99447ce18a80 RDI: ffff99447ce18a80
kernel: RBP: ffff994444c64000 R08: ffff99447ffc7da8 R09: 0000000000027ffb
kernel: R10: 00000000ffff8000 R11: 3fffffffffffffff R12: ffff994446506e00
kernel: R13: ffff99444141d140 R14: ffff994444c64b10 R15: ffff994445962a00
kernel: FS:  00007ffff7ee78c0(0000) GS:ffff99447ce00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f49d24cfe58 CR3: 0000000035430000 CR4: 00000000000006f0
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 0 PID: 1673 at fs/btrfs/volumes.c:2973 btrfs_remove_chunk+0x638/0x6a0
kernel: CPU: 0 PID: 1673 Comm: btrfs-cleaner Tainted: G      D W         5.10.31-1.el7.x86_64 #1
kernel: RIP: 0010:btrfs_remove_chunk+0x638/0x6a0
kernel: Code: 8b 55 50 3e 48 0f ba aa 40 0a 00 00 02 8b 04 24 72 1d 83 f8 fb 74 39 83 f8 e2 74 34 89 c6 48 c7 c7 70 05 c4 ad e8 f4 49 79 00 <0f> 0b 8b 04 24 89 c1 ba 9d 0b 00 00 4c 89 ef 89 04 24 48 c7 c6 d0
kernel: RSP: 0018:ffffa45441dc3d90 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: 000000001a000000 RCX: ffff99447ce18a88
kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99447ce18a80
kernel: RBP: ffff9944794bd3f0 R08: ffff99447ffc7da8 R09: 0000000000027ffb
kernel: R10: 00000000ffff8000 R11: 3fffffffffffffff R12: ffff994446448000
kernel: R13: ffff994444b8d270 R14: ffff994444c64380 R15: ffff994444c64380
kernel: FS:  0000000000000000(0000) GS:ffff99447ce00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f1cbf559000 CR3: 00000000054bc000 CR4: 00000000000006f0
kernel: Call Trace:
kernel: btrfs_delete_unused_bgs+0x6b6/0x770
kernel: cleaner_kthread+0xef/0x120
kernel: ? btree_invalidatepage+0x40/0x40
kernel: kthread+0x11b/0x140
kernel: ? __kthread_bind_mask+0x60/0x60
kernel: ret_from_fork+0x22/0x30
kernel: ---[ end trace e150833b1864e12a ]---
kernel: BTRFS: error (device sdc) in btrfs_remove_chunk:2973: errno=-28 No space left
kernel: BTRFS info (device sdc): forced readonly

@kdave kdave added bug kernel something in kernel has to be done too labels Sep 20, 2021
@kdave
Copy link
Owner

kdave commented Sep 20, 2021

assertion failed: 0, in fs/btrfs/dev-replace.c:499

Matching code from 5.10.31:

 491         down_write(&dev_replace->rwsem);                                                                                                                                            
 492         switch (dev_replace->replace_state) {                                                                                                                                       
 493         case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:                                                                                                                           
 494         case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:                                                                                                                                
 495         case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:                                                                                                                                
 496                 break;                                                                                                                                                              
 497         case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:                                                                                                                                 
 498         case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:                                                                                                                               
 499                 ASSERT(0);                                                                                                                                                          
 500                 ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;                                                                                                               
 501                 up_write(&dev_replace->rwsem);                                                                                                                                      
 502                 goto leave;                                                                                                                                                         
 503         }   

Introduced by commit https://git.kernel.org/linus/5c06147128fbb and it says the assertion is impossible to hit. From the reproducer it seems that when replace fails to start due to ENOSPC it's in a state that the replace state machine does not expect or at least the assertion is based on a wrong assumption.

@KES777
Copy link

KES777 commented Feb 7, 2025

This is probably related, because it also ended with "No space left" #948

@KES777
Copy link

KES777 commented Feb 7, 2025

I did a lot of experiments. The plan was as next:

#!/bin/sh

set -x

## Part 1
cd ~/work/projects/raid/
dd if=/dev/zero of=disk1.img bs=1M count=500
dd if=/dev/zero of=disk2.img bs=1M count=500


# dd if=/dev/zero of=disk1.img bs=1M count=500 # seek=$((500))
# dd if=/dev/zero of=disk2.img bs=1M count=250 # seek=$((250))
# dd if=/dev/zero of=disk3.img bs=1M count=250 # seek=$((250))

sudo losetup /dev/loop10 disk1.img
sudo losetup /dev/loop11 disk2.img
#sudo losetup /dev/loop12 disk3.img
sudo losetup -a

sudo mkfs.btrfs -f -m raid1 -d raid1 /dev/loop10 /dev/loop11 #/dev/loop12

# Test that it does not matter which device to mount
sudo mkdir -p /mnt/raid_test
sudo mount /dev/loop10 /mnt/raid_test
sleep 5
sudo umount /mnt/raid_test
sudo mount /dev/loop11 /mnt/raid_test

sudo btrfs filesystem usage /mnt/raid_test -T
sudo btrfs filesystem show  /mnt/raid_test
sudo btrfs filesystem df    /mnt/raid_test
sudo btrfs device stats     /mnt/raid_test

#
sudo dd if=/dev/urandom of=/mnt/raid_test/test bs=1M count=150
md5sum /mnt/raid_test/test | sudo tee -a /mnt/raid_test/md5sum
ls -la /mnt/raid_test/test
sudo btrfs filesystem usage /mnt/raid_test -T


## Part 2
sudo umount /mnt/raid_test
sudo losetup -d /dev/loop10
sudo losetup -d /dev/loop11
sudo losetup -d /dev/loop12

# Corruption
dd if=/dev/urandom of=replace.img bs=1M count=500
sudo losetup /dev/loop10 disk1.img
sudo losetup /dev/loop11 replace.img
# 11 is broken
# 10 can not be mounted without all devices
sudo mount /dev/loop11 /mnt/raid_test
sudo mount /dev/loop10 /mnt/raid_test
# Mount as degraded
sudo mount -o degraded /dev/loop10 /mnt/raid_test
df -h | grep raid
mount | grep loop
# /dev/loop10     500M  407M   31M  94% /mnt/raid_test
# /dev/loop10 on /mnt/raid_test type btrfs (rw,relatime,degraded,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)

sudo btrfs filesystem usage /mnt/raid_test -T
sudo btrfs filesystem show  /mnt/raid_test
sudo btrfs filesystem df    /mnt/raid_test
sudo btrfs device stats     /mnt/raid_test


#1 Fix
sudo losetup -a
sudo btrfs replace start -B 2 /dev/loop11 /mnt/raid_test/
sudo btrfs filesystem usage /mnt/raid_test -T
sudo btrfs balance start /mnt/raid_test
sudo btrfs filesystem usage /mnt/raid_test -T
cat /mnt/raid_test/md5sum
md5sum /mnt/raid_test/test

errr with file 400Mb 230Mb
works when 130Mb 150Mb

#XXXX Switch second disk
sudo umount /mnt/raid_test
sudo losetup -d /dev/loop10
sudo losetup -d /dev/loop11
sudo losetup -d /dev/loop12

#1   disk2
# Corruption #2    Here we create second replaceX file
dd if=/dev/urandom of=replaceX.img bs=1M count=500
sudo losetup /dev/loop11 replace.img
sudo losetup /dev/loop12 replaceX.img
# 11 is broken
# 10 can not be mounted without all devices
sudo mount /dev/loop11 /mnt/raid_test
sudo mount /dev/loop12 /mnt/raid_test
# Mount as degraded
sudo mount -o degraded /dev/loop11 /mnt/raid_test
df -h | grep raid
mount | grep loop
# /dev/loop10     500M  407M   31M  94% /mnt/raid_test
# /dev/loop10 on /mnt/raid_test type btrfs (rw,relatime,degraded,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)

sudo btrfs filesystem usage /mnt/raid_test -T

#1 Disk 2 Fix
sudo losetup -a
sudo btrfs replace start -B 1 /dev/loop12 /mnt/raid_test/
sudo btrfs filesystem usage /mnt/raid_test -T
sudo btrfs balance start /mnt/raid_test
sudo btrfs filesystem usage /mnt/raid_test -T
cat /mnt/raid_test/md5sum
md5sum /mnt/raid_test/test



#2
sudo btrfs device add /dev/loop11 /mnt/raid_test
dd if=/dev/urandom of=replaceX.img bs=1M count=500
sudo losetup /dev/loop12 replaceX.img
sudo btrfs replace start -B 2 /dev/loop12 /mnt/raid_test/
sudo btrfs filesystem usage /mnt/raid_test -T
mount | grep loop
errr

#3
sudo btrfs device add /dev/loop11 /mnt/raid_test
sudo btrfs device remove missing  /mnt/raid_test


# ???
sudo btrfs filesystem df /mnt/raid_test
sudo btrfs device stats /mnt/raid_test


## Cleanup
sudo umount /mnt/raid_test
sudo losetup -d /dev/loop10
sudo losetup -d /dev/loop11
sudo losetup -d /dev/loop12

# btrfs fi du /mnt/raid_test

The solution #2 and #3 never worked. Usually it crashes on #1 Fix step before XXXX Switch second disk.

The file with bigger sizes like 400Mb and 230Mb on the device with volume 500Mb is failed when I trying to replace broken Drive.

The steps #1 disk2 and #1 Disk 2 Fix are called to broke the first disk and replace it with another one. So at the end of experiment I have two other disks instead of original.

@KES777
Copy link

KES777 commented Feb 7, 2025

Here stack traces:

Stack traces
[  603.844806] BTRFS info (device loop10): first mount of filesystem a8d3057f-0c13-46d6-95b4-b96f54e12173
[  603.844845] BTRFS info (device loop10): using crc32c (crc32c-intel) checksum algorithm
[  603.844857] BTRFS info (device loop10): using free-space-tree
[  603.846922] BTRFS info (device loop10): checking UUID tree
[  603.855277] BTRFS info (device loop10): balance: resume -dusage=90 -musage=90 -susage=90
[  603.855364] BTRFS info (device loop10): relocating block group 726663168 flags data|raid1
[  603.862649] BTRFS info (device loop10): relocating block group 704643072 flags data|raid1
[  603.870555] BTRFS info (device loop10): relocating block group 675282944 flags data|raid1
[  603.877144] BTRFS info (device loop10): relocating block group 654311424 flags system|raid1
[  603.883592] BTRFS info (device loop10): found 1 extents, stage: move data extents
[  603.890095] BTRFS info (device loop10): relocating block group 82837504 flags data|raid1
[  603.893744] ------------[ cut here ]------------
[  603.893746] BTRFS: Transaction aborted (error -28)
[  603.893790] WARNING: CPU: 0 PID: 218 at fs/btrfs/extent-tree.c:3257 __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[  603.893844] Modules linked in: snd_seq_dummy snd_hrtimer overlay qrtr binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi xe snd_hda_codec_realtek snd_hda_codec_generic drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink intel_rapl_msr soundwire_cadence intel_rapl_common snd_sof_intel_hda intel_uncore_frequency snd_sof_pci intel_uncore_frequency_common snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress ac97_bus snd_pcm_dmaengine snd_usb_audio cmdlinepart kvm_intel snd_hda_intel spi_nor snd_intel_dspcfg snd_usbmidi_lib ee1004 mei_pxp mei_hdcp mtd snd_ump i915 snd_intel_sdw_acpi snd_hda_codec uvcvideo kvm videobuf2_vmalloc uvc snd_seq_midi irqbypass 8812au(OE) mfd_aaeon videobuf2_memops eeepc_wmi videobuf2_v4l2 snd_hda_core snd_seq_midi_event videodev
[  603.893871]  asus_wmi snd_rawmidi drm_buddy videobuf2_common rapl ledtrig_audio snd_seq snd_hwdep sparse_keymap ttm mc snd_pcm intel_cstate snd_seq_device spi_intel_pci drm_display_helper i2c_i801 platform_profile cfg80211 snd_timer wmi_bmof snd cec spi_intel i2c_smbus mei_me soundcore rc_core mei i2c_algo_bit intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad acpi_tad input_leds joydev mac_hid sch_fq_codel nfsd msr auth_rpcgss parport_pc nfs_acl ppdev lockd lp grace parport efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 dm_mirror dm_region_hash dm_log hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid crct10dif_pclmul nvme crc32_pclmul e1000e polyval_clmulni polyval_generic ghash_clmulni_intel intel_lpss_pci ahci sha256_ssse3 sha1_ssse3 nvme_core xhci_pci intel_lpss libahci nvme_auth vmd xhci_pci_renesas idma64 video wmi pinctrl_alderlake aesni_intel
[  603.893909]  crypto_simd cryptd
[  603.893911] CPU: 0 PID: 218 Comm: kworker/u16:4 Tainted: G        W  OE      6.8.0-52-generic #53-Ubuntu
[  603.893913] Hardware name: ASUS System Product Name/PRIME H610M-A D4, BIOS 0412 09/29/2021
[  603.893914] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[  603.893959] RIP: 0010:__btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[  603.893991] Code: 48 8b 78 60 e8 ef 65 0e 00 c7 85 74 ff ff ff 01 00 00 00 e9 aa fb ff ff 8b b5 74 ff ff ff 48 c7 c7 c0 fb a5 c0 e8 ae b2 43 dc <0f> 0b e9 cf f9 ff ff 4c 89 ff e8 ef d1 fe ff 48 8b 7d 80 48 8d 55
[  603.893992] RSP: 0018:ffffae8cc0f07b60 EFLAGS: 00010246
[  603.893994] RAX: 0000000000000000 RBX: 00000000ffffffe4 RCX: 0000000000000000
[  603.893995] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  603.893995] RBP: ffffae8cc0f07c20 R08: 0000000000000000 R09: 0000000000000000
[  603.893996] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  603.893996] R13: ffffa05209182c98 R14: fffffffffffffff7 R15: ffffa0526ddef620
[  603.893997] FS:  0000000000000000(0000) GS:ffffa0556f400000(0000) knlGS:0000000000000000
[  603.893998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  603.893999] CR2: 000060f683c301c8 CR3: 0000000021a3c006 CR4: 0000000000f70ef0
[  603.894000] PKRU: 55555554
[  603.894000] Call Trace:
[  603.894001]  <TASK>
[  603.894003]  ? show_regs+0x6d/0x80
[  603.894006]  ? __warn+0x89/0x160
[  603.894008]  ? __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[  603.894038]  ? report_bug+0x17e/0x1b0
[  603.894042]  ? handle_bug+0x51/0xa0
[  603.894045]  ? exc_invalid_op+0x18/0x80
[  603.894046]  ? asm_exc_invalid_op+0x1b/0x20
[  603.894050]  ? __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[  603.894078]  ? load_balance+0x10c/0x8b0
[  603.894081]  run_delayed_tree_ref+0x92/0x200 [btrfs]
[  603.894109]  btrfs_run_delayed_refs_for_head+0x2d0/0x550 [btrfs]
[  603.894136]  __btrfs_run_delayed_refs+0x101/0x1b0 [btrfs]
[  603.894163]  btrfs_run_delayed_refs+0x34/0x130 [btrfs]
[  603.894190]  flush_space+0x270/0x290 [btrfs]
[  603.894232]  btrfs_async_reclaim_metadata_space+0x101/0x220 [btrfs]
[  603.894273]  process_one_work+0x175/0x350
[  603.894276]  worker_thread+0x306/0x440
[  603.894278]  ? __pfx_worker_thread+0x10/0x10
[  603.894280]  kthread+0xef/0x120
[  603.894282]  ? __pfx_kthread+0x10/0x10
[  603.894283]  ret_from_fork+0x44/0x70
[  603.894285]  ? __pfx_kthread+0x10/0x10
[  603.894287]  ret_from_fork_asm+0x1b/0x30
[  603.894289]  </TASK>
[  603.894290] ---[ end trace 0000000000000000 ]---
[  603.894291] BTRFS info (device loop10: state A): dumping space info:
[  603.894293] BTRFS info (device loop10: state A): space_info DATA has 80609280 free, is not full
[  603.894293] BTRFS info (device loop10: state A): space_info total=500170752, used=419430400, pinned=0, reserved=0, may_use=0, readonly=131072 zone_unusable=0
[  603.894295] BTRFS info (device loop10: state A): space_info METADATA has -6815744 free, is full
[  603.894296] BTRFS info (device loop10: state A): space_info total=52428800, used=589824, pinned=0, reserved=16384, may_use=6815744, readonly=51822592 zone_unusable=0
[  603.894297] BTRFS info (device loop10: state A): space_info SYSTEM has 22003712 free, is not full
[  603.894298] BTRFS info (device loop10: state A): space_info total=22020096, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[  603.894299] BTRFS info (device loop10: state A): global_block_rsv: size 5767168 reserved 5767168
[  603.894300] BTRFS info (device loop10: state A): trans_block_rsv: size 0 reserved 0
[  603.894301] BTRFS info (device loop10: state A): chunk_block_rsv: size 0 reserved 0
[  603.894302] BTRFS info (device loop10: state A): delayed_block_rsv: size 0 reserved 0
[  603.894302] BTRFS info (device loop10: state A): delayed_refs_rsv: size 1048576 reserved 1048576
[  603.894303] BTRFS: error (device loop10: state A) in __btrfs_free_extent:3257: errno=-28 No space left
[  603.894305] BTRFS info (device loop10: state EA): forced readonly
[  603.894306] BTRFS error (device loop10: state EA): failed to run delayed ref for logical 31801344 num_bytes 16384 type 176 action 2 ref_mod 1: -28
[  603.894308] BTRFS: error (device loop10: state EA) in btrfs_run_delayed_refs:2261: errno=-28 No space left
[  603.894320] BTRFS info (device loop10: state EA): 1 enospc errors during balance
[  603.894329] BTRFS info (device loop10: state EA): balance: ended with status: -5



btrfs-progs v6.6.3





experiment 2

kes@work ~/work/projects/raid $ sudo btrfs filesystem usage /mnt/raid_test -T
Overall:
    Device size:		1000.00MiB
    Device allocated:		 958.00MiB
    Device unallocated:		  42.00MiB
    Device missing:		 500.00MiB
    Device slack:		     0.00B
    Used:			 801.16MiB
    Free (estimated):		  42.00MiB	(min: 42.00MiB)
    Free (statfs, df):		  31.00MiB
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		   5.50MiB	(used: 0.00B)
    Multiple profiles:		        no

               Data      Metadata  System                               
Id Path        RAID1     RAID1     RAID1    Unallocated Total      Slack
-- ----------- --------- --------- -------- ----------- ---------- -----
 1 /dev/loop10 421.00MiB  50.00MiB  8.00MiB    21.00MiB  500.00MiB     -
 2 missing     421.00MiB  50.00MiB  8.00MiB    21.00MiB  500.00MiB     -
-- ----------- --------- --------- -------- ----------- ---------- -----
   Total       421.00MiB  50.00MiB  8.00MiB    42.00MiB 1000.00MiB 0.00B
   Used        400.00MiB 576.00KiB 16.00KiB                             
kes@work ~/work/projects/raid $ sudo btrfs device remove missing /mnt/raid_test/
ERROR: error removing device 'missing': unable to go below two devices on raid1
kes@work ~/work/projects/raid $ 
kes@work ~/work/projects/raid $ 
kes@work ~/work/projects/raid $ 
kes@work ~/work/projects/raid $ 
kes@work ~/work/projects/raid $ sudo btrfs device add /dev/loop11 /mnt/raid_test
Performing full device TRIM /dev/loop11 (500.00MiB) ...
kes@work ~/work/projects/raid $ sudo btrfs filesystem usage /mnt/raid_test -T
Overall:
    Device size:		   1.46GiB
    Device allocated:		 958.00MiB
    Device unallocated:		 542.00MiB
    Device missing:		 500.00MiB
    Device slack:		     0.00B
    Used:			 801.16MiB
    Free (estimated):		 292.00MiB	(min: 292.00MiB)
    Free (statfs, df):		  41.00MiB
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		   5.50MiB	(used: 0.00B)
    Multiple profiles:		        no

               Data      Metadata  System                              
Id Path        RAID1     RAID1     RAID1    Unallocated Total     Slack
-- ----------- --------- --------- -------- ----------- --------- -----
 1 /dev/loop10 421.00MiB  50.00MiB  8.00MiB    21.00MiB 500.00MiB     -
 2 missing     421.00MiB  50.00MiB  8.00MiB    21.00MiB 500.00MiB     -
 3 /dev/loop11         -         -        -   500.00MiB 500.00MiB     -
-- ----------- --------- --------- -------- ----------- --------- -----
   Total       421.00MiB  50.00MiB  8.00MiB   542.00MiB   1.46GiB 0.00B
   Used        400.00MiB 576.00KiB 16.00KiB                            
kes@work ~/work/projects/raid $ sudo btrfs device remove missing /mnt/raid_test/
ERROR: error removing device 'missing': Read-only file system




[ 1579.768477] BTRFS info (device loop10): disk added /dev/loop11
[ 1598.415473] BTRFS info (device loop10): relocating block group 502267904 flags data|raid1
[ 1598.421181] BTRFS info (device loop10): relocating block group 384827392 flags data|raid1
[ 1598.428535] BTRFS info (device loop10): relocating block group 267386880 flags data|raid1
[ 1598.436970] BTRFS info (device loop10): relocating block group 149946368 flags data|raid1
[ 1598.443438] BTRFS info (device loop10): relocating block group 82837504 flags data|raid1
[ 1598.451108] BTRFS info (device loop10): relocating block group 30408704 flags metadata|raid1
[ 1598.451132] ------------[ cut here ]------------
[ 1598.451134] BTRFS: Transaction aborted (error -28)
[ 1598.451196] WARNING: CPU: 4 PID: 12985 at fs/btrfs/extent-tree.c:3257 __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[ 1598.451313] Modules linked in: snd_seq_dummy snd_hrtimer overlay qrtr binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi xe snd_hda_codec_realtek snd_hda_codec_generic drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink intel_rapl_msr soundwire_cadence intel_rapl_common snd_sof_intel_hda intel_uncore_frequency snd_sof_pci intel_uncore_frequency_common snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress ac97_bus snd_pcm_dmaengine snd_usb_audio cmdlinepart kvm_intel snd_hda_intel spi_nor snd_intel_dspcfg snd_usbmidi_lib ee1004 mei_pxp mei_hdcp mtd snd_ump i915 snd_intel_sdw_acpi snd_hda_codec uvcvideo kvm videobuf2_vmalloc uvc snd_seq_midi irqbypass 8812au(OE) mfd_aaeon videobuf2_memops eeepc_wmi videobuf2_v4l2 snd_hda_core snd_seq_midi_event videodev
[ 1598.451381]  asus_wmi snd_rawmidi drm_buddy videobuf2_common rapl ledtrig_audio snd_seq snd_hwdep sparse_keymap ttm mc snd_pcm intel_cstate snd_seq_device spi_intel_pci drm_display_helper i2c_i801 platform_profile cfg80211 snd_timer wmi_bmof snd cec spi_intel i2c_smbus mei_me soundcore rc_core mei i2c_algo_bit intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad acpi_tad input_leds joydev mac_hid sch_fq_codel nfsd msr auth_rpcgss parport_pc nfs_acl ppdev lockd lp grace parport efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 dm_mirror dm_region_hash dm_log hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid crct10dif_pclmul nvme crc32_pclmul e1000e polyval_clmulni polyval_generic ghash_clmulni_intel intel_lpss_pci ahci sha256_ssse3 sha1_ssse3 nvme_core xhci_pci intel_lpss libahci nvme_auth vmd xhci_pci_renesas idma64 video wmi pinctrl_alderlake aesni_intel
[ 1598.451469]  crypto_simd cryptd
[ 1598.451473] CPU: 4 PID: 12985 Comm: btrfs Tainted: G        W  OE      6.8.0-52-generic #53-Ubuntu
[ 1598.451478] Hardware name: ASUS System Product Name/PRIME H610M-A D4, BIOS 0412 09/29/2021
[ 1598.451480] RIP: 0010:__btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[ 1598.451594] Code: 48 8b 78 60 e8 ef 65 0e 00 c7 85 74 ff ff ff 01 00 00 00 e9 aa fb ff ff 8b b5 74 ff ff ff 48 c7 c7 c0 fb a5 c0 e8 ae b2 43 dc <0f> 0b e9 cf f9 ff ff 4c 89 ff e8 ef d1 fe ff 48 8b 7d 80 48 8d 55
[ 1598.451598] RSP: 0018:ffffae8ccd79f800 EFLAGS: 00010246
[ 1598.451604] RAX: 0000000000000000 RBX: 00000000ffffffe4 RCX: 0000000000000000
[ 1598.451614] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1598.451616] RBP: ffffae8ccd79f8c0 R08: 0000000000000000 R09: 0000000000000000
[ 1598.451619] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 1598.451622] R13: ffffa0520efd4618 R14: fffffffffffffff7 R15: ffffa0524fbeea80
[ 1598.451625] FS:  000074c0c0d7e380(0000) GS:ffffa0556f600000(0000) knlGS:0000000000000000
[ 1598.451630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1598.451633] CR2: 000075d954491ec0 CR3: 000000016d074002 CR4: 0000000000f70ef0
[ 1598.451636] PKRU: 55555554
[ 1598.451639] Call Trace:
[ 1598.451642]  <TASK>
[ 1598.451647]  ? show_regs+0x6d/0x80
[ 1598.451656]  ? __warn+0x89/0x160
[ 1598.451661]  ? __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[ 1598.451788]  ? report_bug+0x17e/0x1b0
[ 1598.451799]  ? handle_bug+0x51/0xa0
[ 1598.451807]  ? exc_invalid_op+0x18/0x80
[ 1598.451815]  ? asm_exc_invalid_op+0x1b/0x20
[ 1598.451826]  ? __btrfs_free_extent.isra.0+0xc72/0x11e0 [btrfs]
[ 1598.451950]  run_delayed_tree_ref+0x92/0x200 [btrfs]
[ 1598.452078]  ? prb_read_valid+0x1c/0x30
[ 1598.452086]  btrfs_run_delayed_refs_for_head+0x2d0/0x550 [btrfs]
[ 1598.452216]  __btrfs_run_delayed_refs+0x101/0x1b0 [btrfs]
[ 1598.452343]  btrfs_run_delayed_refs+0x34/0x130 [btrfs]
[ 1598.452465]  btrfs_commit_transaction+0x6a/0xbe0 [btrfs]
[ 1598.452631]  prepare_to_relocate+0x141/0x1d0 [btrfs]
[ 1598.452820]  relocate_block_group+0x6a/0x560 [btrfs]
[ 1598.452995]  ? btrfs_wait_nocow_writers+0x29/0xd0 [btrfs]
[ 1598.453123]  btrfs_relocate_block_group+0x28c/0x3e0 [btrfs]
[ 1598.453247]  btrfs_relocate_chunk+0x40/0x1b0 [btrfs]
[ 1598.453370]  btrfs_shrink_device+0x27e/0x610 [btrfs]
[ 1598.453489]  btrfs_rm_device+0x18f/0x700 [btrfs]
[ 1598.453622]  ? __check_object_size.part.0+0x72/0x150
[ 1598.453633]  ? btrfs_get_dev_args_from_path+0x5b/0x240 [btrfs]
[ 1598.453802]  btrfs_ioctl_rm_dev_v2+0x1a6/0x250 [btrfs]
[ 1598.453952]  btrfs_ioctl+0x10e6/0x13a0 [btrfs]
[ 1598.454068]  ? __memcg_slab_free_hook+0x115/0x180
[ 1598.454076]  __x64_sys_ioctl+0xa0/0xf0
[ 1598.454083]  x64_sys_call+0x12a3/0x25a0
[ 1598.454087]  do_syscall_64+0x7f/0x180
[ 1598.454092]  ? __fput_sync+0x1c/0x30
[ 1598.454097]  ? syscall_exit_to_user_mode+0x86/0x260
[ 1598.454103]  ? do_syscall_64+0x8c/0x180
[ 1598.454107]  ? exc_page_fault+0x94/0x1b0
[ 1598.454111]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 1598.454117] RIP: 0033:0x74c0c0b24ded
[ 1598.454148] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 1598.454151] RSP: 002b:00007ffd37055240 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1598.454156] RAX: ffffffffffffffda RBX: 00007ffd37057488 RCX: 000074c0c0b24ded
[ 1598.454158] RDX: 00007ffd370552c8 RSI: 000000005000943a RDI: 0000000000000003
[ 1598.454160] RBP: 00007ffd37055290 R08: 0000000000000013 R09: 0000000000000000
[ 1598.454162] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 1598.454164] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1598.454168]  </TASK>
[ 1598.454170] ---[ end trace 0000000000000000 ]---
[ 1598.454174] BTRFS info (device loop10: state A): dumping space info:
[ 1598.454178] BTRFS info (device loop10: state A): space_info DATA has 42926080 free, is full
[ 1598.454181] BTRFS info (device loop10: state A): space_info total=462422016, used=419430400, pinned=0, reserved=0, may_use=0, readonly=65536 zone_unusable=0
[ 1598.454186] BTRFS info (device loop10: state A): space_info METADATA has -11010048 free, is full
[ 1598.454189] BTRFS info (device loop10: state A): space_info total=52428800, used=589824, pinned=0, reserved=16384, may_use=11010048, readonly=51822592 zone_unusable=0
[ 1598.454193] BTRFS info (device loop10: state A): space_info SYSTEM has 8372224 free, is not full
[ 1598.454195] BTRFS info (device loop10: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[ 1598.454199] BTRFS info (device loop10: state A): global_block_rsv: size 5767168 reserved 5767168
[ 1598.454201] BTRFS info (device loop10: state A): trans_block_rsv: size 0 reserved 0
[ 1598.454203] BTRFS info (device loop10: state A): chunk_block_rsv: size 0 reserved 0
[ 1598.454205] BTRFS info (device loop10: state A): delayed_block_rsv: size 0 reserved 0
[ 1598.454207] BTRFS info (device loop10: state A): delayed_refs_rsv: size 1048576 reserved 1048576
[ 1598.454210] BTRFS: error (device loop10: state A) in __btrfs_free_extent:3257: errno=-28 No space left
[ 1598.454216] BTRFS info (device loop10: state EA): forced readonly
[ 1598.454220] BTRFS error (device loop10: state EA): failed to run delayed ref for logical 31735808 num_bytes 16384 type 176 action 2 ref_mod 1: -28
[ 1598.454226] BTRFS: error (device loop10: state EA) in btrfs_run_delayed_refs:2261: errno=-28 No space left



Just another confusing message:
sudo btrfs device remove missing  /mnt/raid_test
ERROR: error removing device 'missing': No space left on device

I am trying to deregister missing device. There is a lot of space:
$ sudo btrfs filesystem usage /mnt/raid_test -T
Overall:
    Device size:        1.46GiB
    Device allocated:       916.00MiB
    Device unallocated:     584.00MiB
    Device missing:      500.00MiB
    Device slack:         0.00B
    Used:          460.81MiB
    Free (estimated):       462.00MiB  (min: 462.00MiB)
    Free (statfs, df):      211.00MiB
    Data ratio:               2.00
    Metadata ratio:           2.00
    Global reserve:        5.50MiB  (used: 0.00B)
    Multiple profiles:             no

               Data      Metadata  System                              
Id Path        RAID1     RAID1     RAID1    Unallocated Total     Slack
-- ----------- --------- --------- -------- ----------- --------- -----
 1 /dev/loop10 400.00MiB  50.00MiB  8.00MiB    42.00MiB 500.00MiB     -
 2 missing     400.00MiB  50.00MiB  8.00MiB    42.00MiB 500.00MiB     -
 3 /dev/loop11         -         -        -   500.00MiB 500.00MiB     -
-- ----------- --------- --------- -------- ----------- --------- -----
   Total       400.00MiB  50.00MiB  8.00MiB   584.00MiB   1.46GiB 0.00B
   Used        230.00MiB 400.00KiB 16.00KiB                            

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kernel something in kernel has to be done too
Projects
None yet
Development

No branches or pull requests

3 participants