Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSOD on mounting encrypted volume #417

Open
aspseka opened this issue Nov 22, 2024 · 16 comments
Open

BSOD on mounting encrypted volume #417

aspseka opened this issue Nov 22, 2024 · 16 comments

Comments

@aspseka
Copy link

aspseka commented Nov 22, 2024

System information

Type Version/Name
Distribution Name Windows 11 Pro
Distribution Version Dev
Kernel Version 27729.1000
Architecture x86 64bit
OpenZFS Version zfswin-2.2.6rc10

Describe the problem you're observing

Trying to mount an encrypted zfs results in BSOD

Describe how to reproduce the problem

  1. import zpool
  2. load key file
  3. zfs mount

Include any warning/errors/backtraces from the system logs

Stack trace

Unable to get exception context, HRESULT 0x8000FFFF
 # Child-SP          RetAddr               Call Site
00 fffff987`e45dd648 fffff807`8e9b313d     nt!KeBugCheckEx
01 fffff987`e45dd650 fffff807`8e9b2f09     nt!KeRestoreExtendedAndSupervisorState+0x21d
02 fffff987`e45dd690 fffff807`3684c010     nt!KeRestoreExtendedProcessorState+0x9
03 fffff987`e45dd6c0 00000000`00000000     OpenZFS!gcm_init(struct gcm_ctx * ctx = 0xffffe583`96727fc0, unsigned char * iv = 0x0000000e`e45dd808 "--- memory read error at address 0x0000000e`e45dd808 ---", unsigned int64 iv_len = 0xffffe583`93166200, unsigned char * auth_data = 0xfffff987`e45ddc20 "m???", unsigned int64 auth_data_len = 0, unsigned int64 block_size = 0xffffe583`8e8a4080, <function> * encrypt_block = 0xffffe583`00000000, <function> * copy_block = 0x00000000`00000004, <function> * xor_block = 0x00000001`00000340)+0x150 [C:\src\openzfs\module\icp\algs\modes\gcm.c @ 565] 
FAULTING_SOURCE_LINE:  C:\src\openzfs\module\icp\algs\modes\gcm.c
FAULTING_SOURCE_FILE:  C:\src\openzfs\module\icp\algs\modes\gcm.c
FAULTING_SOURCE_LINE_NUMBER:  565

FAULTING_SOURCE_CODE:  
   561: 			 * buffer with zero
   562: 			 */
   563: 
   564: 			if (auth_data != NULL) {
>  565: 				memset(authp, 0, block_size);
   566: 				memcpy(authp, &(auth_data[processed]),
   567: 				    remainder);
   568: 			} else {
   569: 				ASSERT0(remainder);
   570: 			}


SYMBOL_NAME:  OpenZFS!gcm_init+150

further context:

2: kd> dx -r1 ((OpenZFS!gcm_ctx *)0xffffe58396727fc0)
((OpenZFS!gcm_ctx *)0xffffe58396727fc0)                 : 0xffffe58396727fc0 [Type: gcm_ctx *]
    [+0x000] gcm_common       [Type: common_ctx]
    [+0x050] gcm_tag_len      : Unable to read memory at Address 0xffffe58396728010
    [+0x058] gcm_processed_data_len : Unable to read memory at Address 0xffffe58396728018
    [+0x060] gcm_pt_buf_len   : Unable to read memory at Address 0xffffe58396728020
    [+0x068] gcm_tmp          [Type: unsigned int [4]]
    [+0x078] gcm_ghash        [Type: unsigned __int64 [2]]
    [+0x088] gcm_H            [Type: unsigned __int64 [2]]
    [+0x098] gcm_Htable       : Unable to read memory at Address 0xffffe58396728058
    [+0x0a0] gcm_htab_len     : Unable to read memory at Address 0xffffe58396728060
    [+0x0a8] gcm_J0           [Type: unsigned __int64 [2]]
    [+0x0b8] gcm_len_a_len_c  [Type: unsigned __int64 [2]]
    [+0x0c8] gcm_pt_buf       : Unable to read memory at Address 0xffffe58396728088
    [+0x0d0] gcm_use_avx      : Unable to read memory at Address 0xffffe58396728090
@lundman
Copy link

lundman commented Nov 22, 2024

That's a new one. I'll see if I can replicate or spot where the pointer went

@aspseka
Copy link
Author

aspseka commented Nov 23, 2024

An expection occurs every time. However, after switching to kernel mode dump (above was a minidump), I got:

INVALID_EXTENDED_PROCESSOR_STATE (131)
Invalid combination of parameters was detected while saving or restoring
extended processor state.
Arguments:
Arg1: 0000000000000007, Indicates which validity check failed.
Arg2: fffffc09d2b6d770
Arg3: fffffc09d2b6d7e0
Arg4: 0000000000000000
 # Child-SP          RetAddr               Call Site
00 fffffc09`d2b6d648 fffff801`9abb313d     nt!KeBugCheckEx
01 fffffc09`d2b6d650 fffff801`9abb2f09     nt!KeRestoreExtendedAndSupervisorState+0x21d
02 fffffc09`d2b6d690 fffff801`3095c410     nt!KeRestoreExtendedProcessorState+0x9
03 fffffc09`d2b6d6c0 fffff801`3095bbd5     OpenZFS!gcm_init_avx(struct gcm_ctx * ctx = 0xfffffc09`d2b6db98, unsigned char * iv = 0xffffa409`142bc88c "k#???", unsigned int64 iv_len = 0xc, unsigned char * auth_data = 0xffffa409`169f5000 "", unsigned int64 auth_data_len = 0x130b0, unsigned int64 block_size = 0x10)+0x320 [C:\src\openzfs\module\icp\algs\modes\gcm.c @ 1493] 
04 fffffc09`d2b6d8e0 fffff801`30956f1b     OpenZFS!gcm_init_ctx(struct gcm_ctx * gcm_ctx = 0xfffffc09`d2b6db98, char * param = 0xfffffc09`d2b6ddb8 "???", unsigned int64 block_size = 0x10, <function> * encrypt_block = 0xfffff801`30958890, <function> * copy_block = 0xfffff801`3095d3d0, <function> * xor_block = 0xfffff801`3095d530)+0x4d5 [C:\src\openzfs\module\icp\algs\modes\gcm.c @ 698] 
05 fffffc09`d2b6da70 fffff801`309568c6     OpenZFS!aes_common_init_ctx(struct aes_ctx * aes_ctx = 0xfffffc09`d2b6db98, void ** template = 0x00000000`00000000, struct crypto_mechanism * mechanism = 0xfffffc09`d2b6dd00, struct crypto_key_t * key = 0xfffffc09`d2b6df58, int kmflag = 0n0, <unnamed-tag> is_encrypt_init = B_FALSE (0n0))+0x21b [C:\src\openzfs\module\icp\io\aes.c @ 483] 
06 fffffc09`d2b6db10 fffff801`309366c3     OpenZFS!aes_decrypt_atomic(struct crypto_mechanism * mechanism = 0xfffffc09`d2b6dd00, struct crypto_key_t * key = 0xfffffc09`d2b6df58, struct crypto_data * ciphertext = 0xfffffc09`d2b6ddf8, struct crypto_data * plaintext = 0xfffffc09`d2b6de20, void * template = 0x00000000`00000000)+0xa6 [C:\src\openzfs\module\icp\io\aes.c @ 288] 
07 fffffc09`d2b6dc80 fffff801`307c135a     OpenZFS!crypto_decrypt(struct crypto_mechanism * mech = 0xfffffc09`d2b6dda0, struct crypto_data * ciphertext = 0xfffffc09`d2b6ddf8, struct crypto_key_t * key = 0xfffffc09`d2b6df58, void * tmpl = 0x00000000`00000000, struct crypto_data * plaintext = 0xfffffc09`d2b6de20)+0x1b3 [C:\src\openzfs\module\icp\api\kcf_cipher.c @ 164] 
08 fffffc09`d2b6dd40 fffff801`307c3f07     OpenZFS!zio_do_crypt_uio(<unnamed-tag> encrypt = B_FALSE (0n0), unsigned int64 crypt = 8, struct crypto_key_t * key = 0xfffffc09`d2b6df58, void * tmpl = 0x00000000`00000000, unsigned char * ivbuf = 0xffffa409`142bc88c "k#???", unsigned int datalen = 0x7f10, struct zfs_uio * puio = 0xfffffc09`d2b6dfa0, struct zfs_uio * cuio = 0xfffffc09`d2b6df68, unsigned char * authbuf = 0xffffa409`169f5000 "", unsigned int auth_len = 0x130b0)+0x2ca [C:\src\openzfs\module\os\windows\zfs\zio_crypt.c @ 457] 
09 fffffc09`d2b6de90 fffff801`305a6292     OpenZFS!zio_do_crypt_data(<unnamed-tag> encrypt = B_FALSE (0n0), struct zio_crypt_key * key = 0xffffa409`148e2020, dmu_object_type ot = DMU_OT_INTENT_LOG (0n9), <unnamed-tag> byteswap = B_FALSE (0n0), unsigned char * salt = 0xffffa409`142bc884 "???", unsigned char * iv = 0xffffa409`142bc88c "k#???", unsigned char * mac = 0xffffa409`142bc898 "???", unsigned int datalen = 0x1b000, unsigned char * plainbuf = 0xffffa409`169b5000 "", unsigned char * cipherbuf = 0xffffa409`169d5000 "", <unnamed-tag> * no_crypt = 0xfffffc09`d2b6e214)+0x367 [C:\src\openzfs\module\os\windows\zfs\zio_crypt.c @ 1964] 
0a fffffc09`d2b6e050 fffff801`30694a47     OpenZFS!spa_do_crypt_abd(<unnamed-tag> encrypt = B_FALSE (0n0), struct spa * spa = 0xffffa409`13879000, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730, dmu_object_type ot = DMU_OT_INTENT_LOG (0n9), <unnamed-tag> dedup = B_FALSE (0n0), <unnamed-tag> bswap = B_FALSE (0n0), unsigned char * salt = 0xffffa409`142bc884 "???", unsigned char * iv = 0xffffa409`142bc88c "k#???", unsigned char * mac = 0xffffa409`142bc898 "???", unsigned int datalen = 0x1b000, struct abd * pabd = 0xffffa409`146e9bc0, struct abd * cabd = 0xffffa409`1415a000, <unnamed-tag> * no_crypt = 0xfffffc09`d2b6e214)+0x2e2 [C:\src\openzfs\module\zfs\dsl_crypt.c @ 2873] 
0b fffffc09`d2b6e120 fffff801`306943bf     OpenZFS!arc_hdr_decrypt(struct arc_buf_hdr * hdr = 0xffffa409`142bc7b8, struct spa * spa = 0xffffa409`13879000, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730)+0x147 [C:\src\openzfs\module\zfs\arc.c @ 1840] 
0c fffffc09`d2b6e250 fffff801`30684b06     OpenZFS!arc_fill_hdr_crypt(struct arc_buf_hdr * hdr = 0xffffa409`142bc7b8, struct kmutex * hash_lock = 0x00000000`00000000, struct spa * spa = 0xffffa409`13879000, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730, <unnamed-tag> noauth = B_FALSE (0n0))+0xef [C:\src\openzfs\module\zfs\arc.c @ 1919] 
0d fffffc09`d2b6e2b0 fffff801`3068692f     OpenZFS!arc_buf_fill(struct arc_buf * buf = 0xffffa409`139c6cb0, struct spa * spa = 0xffffa409`13879000, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730, arc_fill_flags flags = ARC_FILL_LOCKED (0n1))+0x1e6 [C:\src\openzfs\module\zfs\arc.c @ 2013] 
0e fffffc09`d2b6e400 fffff801`30688cc4     OpenZFS!arc_buf_alloc_impl(struct arc_buf_hdr * hdr = 0xffffa409`142bc7b8, struct spa * spa = 0xffffa409`13879000, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730, void * tag = 0xfffffc09`d2b6e868, <unnamed-tag> encrypted = B_FALSE (0n0), <unnamed-tag> compressed = B_FALSE (0n0), <unnamed-tag> noauth = B_FALSE (0n0), <unnamed-tag> fill = B_TRUE (0n1), struct arc_buf ** ret = 0xfffffc09`d2b6e638)+0x39f [C:\src\openzfs\module\zfs\arc.c @ 2801] 
0f fffffc09`d2b6e490 fffff801`306768e4     OpenZFS!arc_read(struct zio * pio = 0x00000000`00000000, struct spa * spa = 0xffffa409`13879000, struct blkptr * bp = 0xfffffc09`d2b6ea00, <function> * done = 0xfffff801`30687f70, void * private = 0xfffffc09`d2b6e868, zio_priority priority = ZIO_PRIORITY_SYNC_READ (0n0), int zio_flags = 0n128, arc_flags * arc_flags = 0xfffffc09`d2b6e754, struct zbookmark_phys * zb = 0xfffffc09`d2b6e730)+0x9c4 [C:\src\openzfs\module\zfs\arc.c @ 5719] 
10 fffffc09`d2b6e670 fffff801`306763d0     OpenZFS!zil_read_log_block(struct zilog * zilog = 0xffffa409`14844040, <unnamed-tag> decrypt = B_TRUE (0n1), struct blkptr * bp = 0xfffffc09`d2b6ea00, struct blkptr * nbp = 0xfffffc09`d2b6e980, char ** begin = 0xfffffc09`d2b6e878, char ** end = 0xfffffc09`d2b6e870, struct arc_buf ** abuf = 0xfffffc09`d2b6e868)+0x194 [C:\src\openzfs\module\zfs\zil.c @ 265] 
11 fffffc09`d2b6e790 fffff801`3067a9d7     OpenZFS!zil_parse(struct zilog * zilog = 0xffffa409`14844040, <function> * parse_blk_func = 0xfffff801`3067ad70, <function> * parse_lr_func = 0xfffff801`3067aa50, void * arg = 0xfffffc09`d2b6eb00, unsigned int64 txg = 0x9f57, <unnamed-tag> decrypt = B_TRUE (0n1))+0x310 [C:\src\openzfs\module\zfs\zil.c @ 511] 
12 fffffc09`d2b6ea90 fffff801`307b1b72     OpenZFS!zil_replay(struct objset * os = 0xffffa409`14857340, void * arg = 0xffffa409`148e0000, <function> ** replay_func = 0xfffff801`30a51220)+0x127 [C:\src\openzfs\module\zfs\zil.c @ 4333] 
13 fffffc09`d2b6eb30 fffff801`307aec73     OpenZFS!zfsvfs_setup(struct zfsvfs * zfsvfs = 0xffffa409`148e0000, <unnamed-tag> mounting = B_TRUE (0n1))+0x122 [C:\src\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 842] 
14 fffffc09`d2b6eb70 fffff801`307af4c6     OpenZFS!zfs_domount(struct mount * vfsp = 0xffffa409`346a4d60, unsigned int64 mount_dev = 0, char * osname = 0xffffa409`13881400 "backup2/Fotos", struct vfs_context * ctx = 0x00000000`00000000)+0x293 [C:\src\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1012] 
15 fffffc09`d2b6ed00 fffff801`307d550d     OpenZFS!zfs_vfs_mount(struct mount * vfsp = 0xffffa409`346a4d60, struct vnode * mvp = 0x00000000`00000000, unsigned int64 data = 0xfffffc09`d2b6ee68, struct vfs_context * context = 0x00000000`00000000)+0x716 [C:\src\openzfs\module\os\windows\zfs\zfs_vfsops.c @ 1300] 
16 fffffc09`d2b6ede0 fffff801`3077f2f3     OpenZFS!zfs_windows_mount(struct zfs_cmd * zc = 0xffffa409`13efc000)+0x7ad [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows_mount.c @ 1109] 
17 fffffc09`d2b6f330 fffff801`3053ee7b     OpenZFS!zfs_ioc_mount(struct zfs_cmd * zc = 0xffffa409`13efc000)+0x13 [C:\src\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 910] 
18 fffffc09`d2b6f360 fffff801`3077f1a6     OpenZFS!zfsdev_ioctl_common(unsigned int vecnum = 0x8a, struct zfs_cmd * zc = 0xffffa409`13efc000, int flag = 0n0)+0x7db [C:\src\openzfs\module\zfs\zfs_ioctl.c @ 8030] 
19 fffffc09`d2b6f440 fffff801`30793887     OpenZFS!zfsdev_ioctl(struct _DEVICE_OBJECT * DeviceObject = 0xffffa409`1ec10670 Device for "\FileSystem\OpenZFS", struct _IRP * Irp = 0xffffa409`2b568920, int flag = 0n0)+0x166 [C:\src\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 877] 
1a fffffc09`d2b6f4f0 fffff801`30792c0c     OpenZFS!ioctlDispatcher(struct _DEVICE_OBJECT * DeviceObject = 0xffffa409`1ec10670 Device for "\FileSystem\OpenZFS", struct _IRP ** PIrp = 0xfffffc09`d2b6f6d8, struct _IO_STACK_LOCATION * IrpSp = 0xffffa409`2b5689f0 IRP_MJ_DEVICE_CONTROL / 0x0 for Device for "\FileSystem\OpenZFS")+0x2b7 [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 6980] 
1b fffffc09`d2b6f5c0 fffff801`9aa29701     OpenZFS!dispatcher(struct _DEVICE_OBJECT * DeviceObject = 0xffffa409`1ec10670 Device for "\FileSystem\OpenZFS", struct _IRP * Irp = 0xffffa409`2b568920)+0x1ec [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 8053] 
1c fffffc09`d2b6f6f0 fffff801`9aa29683     nt!IopfCallDriver+0x55
1d fffffc09`d2b6f730 fffff801`9b102f67     nt!IofCallDriver+0x13
1e fffffc09`d2b6f760 fffff801`9b0eaf89     nt!IopSynchronousServiceTail+0x197
1f fffffc09`d2b6f810 fffff801`9b0ea25e     nt!IopXxxControlFile+0xd1d
20 fffffc09`d2b6f940 fffff801`9ae97e25     nt!NtDeviceIoControlFile+0x5e
21 fffffc09`d2b6f9b0 00007ffb`5d341e04     nt!KiSystemServiceCopyEnd+0x25
22 00000080`000f1238 00000000`00000000     0x00007ffb`5d341e04
FAULTING_SOURCE_LINE:  C:\src\openzfs\module\icp\algs\modes\gcm.c
FAULTING_SOURCE_FILE:  C:\src\openzfs\module\icp\algs\modes\gcm.c
FAULTING_SOURCE_LINE_NUMBER:  1493

FAULTING_SOURCE_CODE:  
  1489: 		GHASH_AVX(ctx, datap, chunk_size);
  1490: 		datap += chunk_size;
  1491: 		clear_fpu_regs();
  1492: 		kfpu_end();
> 1493: 		kfpu_begin();
  1494: 	}
  1495: 	/* Ghash the remainder and handle possible incomplete GCM block. */
  1496: 	if (bleft > 0) {
  1497: 		size_t incomp = bleft % block_size;
  1498: 

SYMBOL_NAME:  OpenZFS!gcm_init_avx+320

@lundman
Copy link

lundman commented Nov 23, 2024

Hmm perhaps nesting the kfpu_begin/kfpu_end is a bad idea, but upstream put it that way. Also, need to keep an eye on the scope of the stored area.

Ah yeah, this wont work;

https://github.com/openzfsonwindows/openzfs/blob/development/module/icp/algs/modes/gcm.c#L1457-L1494

The scope of XSTATE_SAVE is all over the place:
https://github.com/openzfsonwindows/openzfs/blob/development/include/os/windows/spl/sys/simd.h#L105-L108

Needs to be replaced by the kfpu_begin_ctx variants

@lundman
Copy link

lundman commented Nov 23, 2024

@aspseka
Copy link
Author

aspseka commented Nov 24, 2024

Thank you!

First tests:
Got an access violation inside zfs.exe. Mounting did suceed, however! Could not unmount again: zfs unmount seems to hang displaying zunmount(testpool/encr,D:/encr/) running

Trace of zfs.exe:

(5d94.5d98): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
ntdll!RtlpWaitOnCriticalSection+0xc3:
00007ffb`f2c27fe3 ff4024          inc     dword ptr [rax+24h] ds:00000000`00000024=????????

(not subsequently handled)

 # Child-SP          RetAddr               Call Site
00 00000080`000f45b0 00007ffb`f2c27c85     ntdll!RtlpWaitOnCriticalSection+0xc3
01 00000080`000f46b0 00007ffb`f2bf2d27     ntdll!RtlpEnterCriticalSectionContended+0x1f5
02 00000080`000f4730 00007ff7`79c0adc4     ntdll!RtlEnterCriticalSection+0xf7
03 00000080`000f4770 00007ff7`79c0afe9     zfs!pthread_mutex_lock(struct _RTL_CRITICAL_SECTION * m = 0x00000080`005a6c78)+0x14 [C:\src\openzfs\lib\os\windows\libpthread\pthread.h @ 274] 
04 00000080`000f47a0 00007ff7`79c4f94a     zfs!libzfs_mnttab_add(struct libzfs_handle * hdl = 0x00000080`005a6440, char * special = 0x00000080`005a6e30 "testpool/encr", char * mountp = 0x00000080`000f7a40 "/testpool/encr", char * mntopts = 0x00000080`000f69e0 "defaults,noatime,dev,exec,rw,setuid,nomand,zfsutil")+0x29 [C:\src\openzfs\lib\libzfs\libzfs_dataset.c @ 952] 
05 00000080`000f47f0 00007ff7`79c4f0ed     zfs!zfs_mount_at(struct zfs_handle * zhp = 0x00000080`005a6e20, char * options = 0x00000000`00000000 "", int flags = 0n128, char * mountpoint = 0x00000080`000f7a40 "/testpool/encr")+0x63a [C:\src\openzfs\lib\libzfs\libzfs_mount.c @ 540] 
06 00000080`000f79f0 00007ff7`79bd6a8e     zfs!zfs_mount(struct zfs_handle * zhp = 0x00000080`005a6e20, char * options = 0x00000000`00000000 "", int flags = 0n0)+0x7d [C:\src\openzfs\lib\libzfs\libzfs_mount.c @ 350] 
07 00000080`000f9a50 00007ff7`79bd595b     zfs!share_mount_one(struct zfs_handle * zhp = 0x00000080`005a6e20, int op = 0n1, int flags = 0n0, sa_protocol protocol = 0n-1 (No matching enumerant), <unnamed-tag> explicit = B_TRUE (0n1), char * options = 0x00000000`00000000 "")+0xcbe [C:\src\openzfs\cmd\zfs\zfs_main.c @ 7334] 
08 00000080`000ffbf0 00007ff7`79bcd730     zfs!share_mount(int op = 0n1, int argc = 0n1, char ** argv = 0x00000080`005a34b0)+0x96b [C:\src\openzfs\cmd\zfs\zfs_main.c @ 7655] 
09 00000080`000ffd90 00007ff7`79bc5ca0     zfs!zfs_do_mount(int argc = 0n2, char ** argv = 0x00000080`005a34a8)+0x20 [C:\src\openzfs\cmd\zfs\zfs_main.c @ 7677] 
0a 00000080`000ffdd0 00007ff7`7a7ee564     zfs!main(int argc = 0n3, char ** argv = 0x00000080`0058a040)+0x320 [C:\src\openzfs\cmd\zfs\zfs_main.c @ 9248] 
0b (Inline Function) --------`--------     zfs!invoke_main(void)+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78] 
0c 00000080`000ffe50 00007ffb`f1285767     zfs!__scrt_common_main_seh(void)+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
0d 00000080`000ffe90 00007ffb`f2c8d03c     KERNEL32!BaseThreadInitThunk+0x17
0e 00000080`000ffec0 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

Attached to hanging zfs unmount:

 # Child-SP          RetAddr               Call Site
00 00000080`000f8498 00007ffb`f06b2913     ntdll!NtDeviceIoControlFile+0x14
01 00000080`000f84a0 00007ffb`f12778d5     KERNELBASE!DeviceIoControl+0x73
02 00000080`000f8510 00007ff7`79c5cd87     KERNEL32!DeviceIoControlImplementation+0x75
03 00000080`000f8560 00007ff7`79c5cc51     zfs!zcmd_ioctl_compat(int fd = 0n376, int request = 0n139, struct zfs_cmd * zc = 0x00000080`000fa820, int cflag = 0n0)+0xe7 [C:\src\openzfs\lib\libzfs_core\os\windows\libzfs_core_ioctl.c @ 54] 
04 00000080`000f85f0 00007ff7`79c2eef8     zfs!lzc_ioctl_fd(int fd = 0n376, unsigned long request = 0x8b, struct zfs_cmd * zc = 0x00000080`000fa820)+0x41 [C:\src\openzfs\lib\libzfs_core\os\windows\libzfs_core_ioctl.c @ 110] 
05 00000080`000f8640 00007ff7`79c4e499     zfs!zfs_ioctl(struct libzfs_handle * hdl = 0x00000080`005c5e40, int request = 0n139, struct zfs_cmd * zc = 0x00000080`000fa820)+0x28 [C:\src\openzfs\lib\libzfs\os\windows\libzfs_util_os.c @ 55] 
06 00000080`000f8680 00007ff7`79c4e25b     zfs!do_unmount_impl(struct zfs_handle * zhp = 0x00000080`00628230, char * mntpt = 0x00000080`005c55d0 "D:/encr/", int flags = 0n0)+0xf9 [C:\src\openzfs\lib\libzfs\os\windows\libzfs_mount_os.c @ 322] 
07 00000080`000fb9e0 00007ff7`79c501a7     zfs!do_unmount(struct zfs_handle * zhp = 0x00000080`00628230, char * mntpt = 0x00000080`005c55d0 "D:/encr/", int flags = 0n0)+0x3b [C:\src\openzfs\lib\libzfs\os\windows\libzfs_mount_os.c @ 371] 
08 00000080`000fba20 00007ff7`79c4fe32     zfs!unmount_one(struct zfs_handle * zhp = 0x00000080`00628230, char * mountpoint = 0x00000080`005c55d0 "D:/encr/", int flags = 0n0)+0x27 [C:\src\openzfs\lib\libzfs\libzfs_mount.c @ 551] 
09 00000080`000fba70 00007ff7`79c54b35     zfs!zfs_unmount(struct zfs_handle * zhp = 0x00000080`00628230, char * mountpoint = 0x00000000`00000000 "", int flags = 0n0)+0x112 [C:\src\openzfs\lib\libzfs\libzfs_mount.c @ 620] 
0a 00000080`000fbb10 00007ff7`79c5052f     zfs!changelist_prefix(struct prop_changelist * clp = 0x00000080`0060be60)+0x185 [C:\src\openzfs\lib\libzfs\libzfs_changelist.c @ 144] 
0b 00000080`000fbb80 00007ff7`79bd7f34     zfs!zfs_unmountall(struct zfs_handle * zhp = 0x00000080`005c6f20, int flags = 0n0)+0x4f [C:\src\openzfs\lib\libzfs\libzfs_mount.c @ 676] 
0c 00000080`000fbbd0 00007ff7`79bcd760     zfs!unshare_unmount(int op = 0n1, int argc = 0n1, char ** argv = 0x00000080`005c6770)+0xe34 [C:\src\openzfs\cmd\zfs\zfs_main.c @ 8092] 
0d 00000080`000ffd20 00007ff7`79bc5ca0     zfs!zfs_do_unmount(int argc = 0n2, char ** argv = 0x00000080`005c6768)+0x20 [C:\src\openzfs\cmd\zfs\zfs_main.c @ 8113] 
0e 00000080`000ffd60 00007ff7`7a7ee564     zfs!main(int argc = 0n3, char ** argv = 0x00000080`005aa4b0)+0x320 [C:\src\openzfs\cmd\zfs\zfs_main.c @ 9248] 
0f (Inline Function) --------`--------     zfs!invoke_main(void)+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78] 
10 00000080`000ffde0 00007ffb`f1285767     zfs!__scrt_common_main_seh(void)+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
11 00000080`000ffe20 00007ffb`f2c8d03c     KERNEL32!BaseThreadInitThunk+0x17
12 00000080`000ffe50 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

@lundman
Copy link

lundman commented Nov 25, 2024

OK so are we saying the encryption is better at least, or don't you get that far?

Not sure what the zfs violation is about, but we have changed things in the mounting code, so it is likely to be something. Does it happen every time?
zfs!pthread_mutex_lock(struct _RTL_CRITICAL_SECTION * m = 0x00000080005a6c78)+0x14says we are lockingpthread_mutex_lock(&hdl->libzfs_mnttab_cache_lock);at a valid handle butntdll!RtlpWaitOnCriticalSection+0xc3: ds:0000000000000024 claims we tried to lock NULL. Almost as if hdl was released while we were still mounting.

Unmount looks to get stuck in IoCallDevice:

https://github.com/openzfsonwindows/openzfs/blob/development/module/os/windows/zfs/zfs_vnops_windows_mount.c#L1729-L1958

There's about 4 in there. With mount, I had to put IoCallDevice as a separate thread due to locking, so it is probably something similar needed here. Which one though.

@aspseka
Copy link
Author

aspseka commented Dec 1, 2024

After additional testing:

  1. mounting and reading/writing to the encrypted volume works fine. (The exception is apparently thrown on generating fstab entry (?), but after mounting the volume)
  2. automatic unmounting while rebooting works fine
  3. manual unmounting gets stuck

@lundman
Copy link

lundman commented Dec 2, 2024

OK - Windows doesn't have a fstab so it is Linux code running when it shouldn't be. Any hints/clues as to where?

  1. I will work on fixing this hang

@lundman
Copy link

lundman commented Dec 3, 2024

Actually, no idea where unmount is stuck, could you make an unmount stick, then use Registry to write out cbuf?

https://openzfsonosx.org/wiki/Windows_cbuf

@lundman
Copy link

lundman commented Dec 3, 2024

On the updating fstab, do your datasets have anything set in the sharing properties?

  • ah yep, setting sharenfs causes assertion - I can fix that.

@lundman
Copy link

lundman commented Dec 3, 2024

OK this should take care of the access violation

zpool.zip

@aspseka
Copy link
Author

aspseka commented Dec 8, 2024

Yes, the binaries from the zip fix the assertion.

(I have sharenfs off default )

Unmounting still seems to hang, but it seems zfs-internal:

FFFFB7891EA60080: vnode_couplefileobject: vp FFFFB78934D3AB88 fo FFFFB789451591B0
FFFFB7891EA60080: vnode_fileobject_add: added FO FFFFB789451591B0 to vp FFFFB78934D3AB88
FFFFB7891EA60080: vnode_decouplefileobject: fo FFFFB789451591B0 -X-> FFFFB78934D3AB88
FFFFB7891EA60080: vnode_fileobject_remove: remed FO FFFFB789451591B0 fm vp FFFFB78934D3AB88
FFFFB7891CCE5080: vnode_couplefileobject: vp FFFFB78934D3AB88 fo FFFFB789278CB380
FFFFB7891CCE5080: vnode_fileobject_add: added FO FFFFB789278CB380 to vp FFFFB78934D3AB88
FFFFB7891CCE5080: vnode_decouplefileobject: fo FFFFB789278CB380 -X-> FFFFB78934D3AB88
FFFFB7891CCE5080: vnode_fileobject_remove: remed FO FFFFB789278CB380 fm vp FFFFB78934D3AB88
FFFFB7891AA0D080: vnode_couplefileobject: vp FFFFB78934D3AB88 fo FFFFB789278E3700
FFFFB7891AA0D080: vnode_fileobject_add: added FO FFFFB789278E3700 to vp FFFFB78934D3AB88
FFFFB7891AA0D080: vnode_decouplefileobject: fo FFFFB789278E3700 -X-> FFFFB78934D3AB88
FFFFB7891AA0D080: vnode_fileobject_remove: remed FO FFFFB789278E3700 fm vp FFFFB78934D3AB88
FFFFB78929F8C080: vnode_couplefileobject: vp FFFFB78934D3AB88 fo FFFFB7894512DA60
FFFFB78929F8C080: vnode_fileobject_add: added FO FFFFB7894512DA60 to vp FFFFB78934D3AB88
FFFFB78929F8C080: vnode_decouplefileobject: fo FFFFB7894512DA60 -X-> FFFFB78934D3AB88
FFFFB78929F8C080: vnode_fileobject_remove: remed FO FFFFB7894512DA60 fm vp FFFFB78934D3AB88

It iterates quite slowly through its objects. Meanwhile, the external HDD is idle/stops spinning, so there seems to be no I/O.
Unmounting using zfs/linux does not exhibit this delay.

@aspseka
Copy link
Author

aspseka commented Dec 8, 2024

In between the vnode_* there is also

FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 163, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 52523, loaded 1268919 ms ago, max_size 7454482432
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 166, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268847 ms ago, max_size 8274784256
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 168, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268972 ms ago, max_size 8371707904
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 169, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268937 ms ago, max_size 8589262848
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 170, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268898 ms ago, max_size 8589848576
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 171, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268818 ms ago, max_size 8589918208
FFFFB7892817A080: metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 52645, spa testpool, vdev_id 0, ms_id 172, weight 800000000000001, selected txg 52525 (600481 ms ago), alloc_txg 0, loaded 1268805 ms ago, max_size 8589529088

@lundman
Copy link

lundman commented Dec 9, 2024

Neat, what we are looking for are the printf from
https://github.com/openzfsonwindows/openzfs/blob/development/module/os/windows/zfs/zfs_vnops_windows_mount.c#L1736-L1958
In particular, if we see any of

dprintf("Set UNMOUNTING\n");
dprintf("vp %p\n", vp);
dprintf("Waiting for mount root to be released\n");
dprintf("%s: calling zfs_vfs_unmount\n", __func__);

There are two fixme in there that are pretty likely culprits.

@aspseka
Copy link
Author

aspseka commented Dec 30, 2024

Was not able to capture these printfs as they already wrapped out of cbuf by the time I was able to dump it.

Also got segfaults similar to those in zfs mount inside zfs list as well.
So wanted to try again today, using rc11 to have symbols.

Importing leads to BSOD. This is most likely a separate issue, opened !428 for this problem.

@aspseka
Copy link
Author

aspseka commented Dec 30, 2024

I don't seem to be able to install .99-3117-g773833ebc anymore: gettting heap corruption on zfsinstaller install with inconsistent backtraces. (What changed besides trying to install rc11 in between? New Windows dev build...)

On that note: I did experience sporadic BSOD (heap corruption) on boot that I was not directly able to trace back to OpenZFSonWindows besides the fact that they only occurred with the driver installed.

(By the way, running zfsinstaller install with a relative path segfaults as it expects a non-zero-length path.)

Edit: Was finally able to zfsintaller install. But get BSOD on every boot (with .99-3117).

Edit2: Installed rc11 and got a similar heap corruption during installation. Subsequent boots are fine, however, and driver loads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants