Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme: testcases for TLS support #158

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

hreinecke
Copy link
Contributor

This pull request adds two new testcases for nvme TLS support, one for 'plain' TLS with TLS PSKs, and the other one for testing 'secure concatenation' where TLS is started after DH-HMAC-CHAP authentication.

tests/nvme/059 Outdated
return 1
fi

systemctl start tlshd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check that it exists as a dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check the version of ktls-utils?
Or just explain in a comment if you have any expectations from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Will check what we can do here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to "man systemctl" "EXIT STATUS" section, systemctl command returns exit status "4" for "no such unit". So it would work to check if "systemctl status tlshd" command's exist status is 4 or not.

I use Fedora, and needed to install "ktls-utils" package to run the test case. It would be the better to mention the word "ktls-utils" in the SKIP_REASONS message to help users to understand what is missing.

tests/nvme/059 Outdated
_nvmet_target_setup --blkdev file --tls

# Test unencrypted connection
echo "Test unencrypted connection w/ tls not required"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, looks pretty useless...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. This is testing the 'not required' setting in nvmet, which should accept both TLS and non-TLS connections even if TLS is enabled on the target.

tests/nvme/059 Outdated
echo "WARNING: connection is not encrypted"
fi

_nvme_disconnect_subsys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any room to test passing explicit keys and private keyrings to this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do that here. This is for testing the 'default' case, where PSKs are pre-populated in the keyring and the connection picks up the keys automatically. Explicit keys and keyrings are really just for testing.
But we should have a separate testcase for that, true.

Copy link
Collaborator

@kawasaki kawasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit 320b9b6 does not look adding value. The helper function requires more types than direct call of "_require_nvme_trtype tcp".

Copy link
Collaborator

@kawasaki kawasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit bc544f8 introduces the --concat option of _nvme_connect_subsys(), but it is not used anywhere. Do we need this commit in this PR? If it is a preparation for the next PR, I suggest to move this commit to that PR.

@kawasaki
Copy link
Collaborator

@hreinecke Thanks for rebasing the series. I ran the test case in my environment using the kernel v6.13 and the latest nvme-cli (2.10.2-77-gb4628c3, with libnvme 1.11.1-48-gacc19fc), but it fails.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime    ...  4.690s
    --- tests/nvme/059.out      2025-01-29 17:10:17.090513738 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-30 13:21:58.468322103 +0900
    @@ -2,9 +2,13 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +cat: /sys/class/nvme//tls_key: No such file or directory
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

059.full file left logs as follows:

NQN:blktests-subsystem-1 disconnected 1 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)

kernel message was as follows:

[   53.709438][ T1008] run blktests nvme/059 at 2025-01-30 13:21:53
[   53.852869][ T1088] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   53.871778][ T1089] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[   53.882422][ T1092] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   53.956688][ T1099] nvme nvme1: failed to connect socket: -512
[   53.966599][   T47] nvmet_tcp: failed to allocate queue, error -107
[   53.972570][  T225] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   53.978282][ T1099] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   53.981615][ T1099] nvme nvme1: creating 4 I/O queues.
[   53.985261][ T1099] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   53.988654][ T1099] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[   54.139181][ T1118] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   54.319783][ T1125] nvme nvme1: failed to connect socket: -512
[   54.329235][   T47] nvmet_tcp: failed to allocate queue, error -107
[   55.691522][ T1182] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   55.714859][ T1186] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   55.776929][ T1193] nvme_tcp: queue 0: failed to receive icresp, error -4
[   57.044777][ T1237] nvme nvme1: failed to connect socket: -512

I'm not sure if this catches a kernel bug. Still the test case may need improvement, or I may be missing something. If you have any insights about this failure, please let me know.

@hreinecke
Copy link
Contributor Author

The commit 320b9b6 does not look adding value. The helper function requires more types than direct call of "_require_nvme_trtype tcp".

Okay, I'll fix it up.

@hreinecke
Copy link
Contributor Author

The commit bc544f8 introduces the --concat option of _nvme_connect_subsys(), but it is not used anywhere. Do we need this commit in this PR? If it is a preparation for the next PR, I suggest to move this commit to that PR.

It would if I had pushed the testcase for secure concatenation...

@hreinecke
Copy link
Contributor Author

@hreinecke Thanks for rebasing the series. I ran the test case in my environment using the kernel v6.13 and the latest nvme-cli (2.10.2-77-gb4628c3, with libnvme 1.11.1-48-gacc19fc), but it fails.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime    ...  4.690s
    --- tests/nvme/059.out      2025-01-29 17:10:17.090513738 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-30 13:21:58.468322103 +0900
    @@ -2,9 +2,13 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +cat: /sys/class/nvme//tls_key: No such file or directory
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

059.full file left logs as follows:

NQN:blktests-subsystem-1 disconnected 1 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)

kernel message was as follows:

[   53.709438][ T1008] run blktests nvme/059 at 2025-01-30 13:21:53
[   53.852869][ T1088] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   53.871778][ T1089] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[   53.882422][ T1092] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   53.956688][ T1099] nvme nvme1: failed to connect socket: -512
[   53.966599][   T47] nvmet_tcp: failed to allocate queue, error -107
[   53.972570][  T225] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   53.978282][ T1099] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   53.981615][ T1099] nvme nvme1: creating 4 I/O queues.
[   53.985261][ T1099] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   53.988654][ T1099] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[   54.139181][ T1118] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   54.319783][ T1125] nvme nvme1: failed to connect socket: -512
[   54.329235][   T47] nvmet_tcp: failed to allocate queue, error -107
[   55.691522][ T1182] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   55.714859][ T1186] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   55.776929][ T1193] nvme_tcp: queue 0: failed to receive icresp, error -4
[   57.044777][ T1237] nvme nvme1: failed to connect socket: -512

I'm not sure if this catches a kernel bug. Still the test case may need improvement, or I may be missing something. If you have any insights about this failure, please let me know.

_check_ctrl_tls() need to redirect stderr to /dev/null, not stdout (as it does now on two occasions). Will be fixing up the testcase.

@kawasaki
Copy link
Collaborator

@hreinecke Thanks for updating the patches. Question, which kernel should I use to run the test case?
I used the kernel with the tag "nvme-6.14-2025-01-28" with your patch series titled "[PATCHv14 00/10] nvme: implement secure concatenation". Blktests is hreinecke/tls.v3 at git hash 990fc84. But still see the test cases fail. Do you see the new test cases pass?

nvme/059 failure looks like this.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime  4.666s  ...  6.473s
    --- tests/nvme/059.out      2025-01-31 11:10:39.925656241 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-31 17:39:08.291736971 +0900
    @@ -2,9 +2,11 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
     Test unencrypted connection w/ tls required (should fail)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

Also, nvme/060 run terminated in the middle. Kernel reported a BUG.

[  112.851185] [   T1360] run blktests nvme/060 at 2025-01-31 17:39:14
[  112.984431] [   T1462] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  113.001125] [   T1463] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[  113.008784] [   T1466] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  113.146659] [   T1477] nvme nvme1: failed to connect socket: -512
[  113.155644] [     T68] nvmet_tcp: failed to allocate queue, error -107
[  113.164733] [     T65] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[  113.176065] [     T65] ==================================================================
[  113.176747] [     T65] BUG: KASAN: slab-out-of-bounds in vsnprintf+0x1589/0x18f0
[  113.177324] [     T65] Write of size 1 at addr ffff88812effdec3 by task kworker/2:1H/65

[  113.178094] [     T65] CPU: 2 UID: 0 PID: 65 Comm: kworker/2:1H Not tainted 6.13.0-rc4+ #397
[  113.178687] [     T65] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[  113.179412] [     T65] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[  113.179924] [     T65] Call Trace:
[  113.180200] [     T65]  <TASK>
[  113.180417] [     T65]  dump_stack_lvl+0x6a/0x90
[  113.180762] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.181126] [     T65]  print_report+0x174/0x505
[  113.181464] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.181800] [     T65]  ? __virt_addr_valid+0x208/0x430
[  113.182207] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.183526] [     T65]  kasan_report+0xa7/0x170
[  113.184840] [     T65]  ? format_decode+0x676/0xa40
[  113.186188] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.187502] [     T65]  vsnprintf+0x1589/0x18f0
[  113.188795] [     T65]  ? __pfx_vsnprintf+0x10/0x10
[  113.190137] [     T65]  sprintf+0xb5/0xf0
[  113.191349] [     T65]  ? __pfx_sprintf+0x10/0x10
[  113.192578] [     T65]  ? __kmalloc_noprof+0x3c4/0x550
[  113.193834] [     T65]  ? nvme_auth_derive_tls_psk+0x15c/0x2df [nvme_auth]
[  113.195242] [     T65]  nvme_auth_derive_tls_psk+0x1da/0x2df [nvme_auth]
[  113.196606] [     T65]  nvmet_auth_insert_psk+0x2fb/0x680 [nvmet]
[  113.197937] [     T65]  ? __pfx_nvmet_auth_insert_psk+0x10/0x10 [nvmet]
[  113.199303] [     T65]  ? rcu_is_watching+0x11/0xb0
[  113.200488] [     T65]  ? nvmet_execute_auth_send+0x157a/0x3380 [nvmet]
[  113.201785] [     T65]  ? __asan_memcpy+0x38/0x60
[  113.202956] [     T65]  nvmet_execute_auth_send+0x2f92/0x3380 [nvmet]
[  113.204260] [     T65]  ? sock_recvmsg+0x179/0x220
[  113.205397] [     T65]  nvmet_tcp_io_work+0x19d1/0x2970 [nvmet_tcp]
[  113.206602] [     T65]  ? __pfx_nvmet_tcp_io_work+0x10/0x10 [nvmet_tcp]
[  113.207843] [     T65]  ? __pfx_lock_release+0x10/0x10
[  113.208985] [     T65]  process_one_work+0x85a/0x1460
[  113.210122] [     T65]  ? __pfx_lock_acquire+0x10/0x10
[  113.211222] [     T65]  ? __pfx_process_one_work+0x10/0x10
[  113.212327] [     T65]  ? assign_work+0x16c/0x240
[  113.213383] [     T65]  ? lock_is_held_type+0xd5/0x130
[  113.214468] [     T65]  worker_thread+0x5e2/0xfc0
[  113.215513] [     T65]  ? __kthread_parkme+0xb1/0x1d0
[  113.216571] [     T65]  ? __pfx_worker_thread+0x10/0x10
[  113.217636] [     T65]  ? __pfx_worker_thread+0x10/0x10
[  113.218681] [     T65]  kthread+0x2d1/0x3a0
[  113.219644] [     T65]  ? _raw_spin_unlock_irq+0x24/0x50
[  113.220696] [     T65]  ? __pfx_kthread+0x10/0x10
[  113.221695] [     T65]  ret_from_fork+0x30/0x70
[  113.222660] [     T65]  ? __pfx_kthread+0x10/0x10
[  113.223597] [     T65]  ret_from_fork_asm+0x1a/0x30
[  113.224536] [     T65]  </TASK>

[  113.226064] [     T65] Allocated by task 65:
[  113.226906] [     T65]  kasan_save_stack+0x2c/0x50
[  113.227809] [     T65]  kasan_save_track+0x10/0x30
[  113.228696] [     T65]  __kasan_kmalloc+0xa6/0xb0
[  113.229536] [     T65]  __kmalloc_noprof+0x1c6/0x550
[  113.230391] [     T65]  nvme_auth_derive_tls_psk+0x15c/0x2df [nvme_auth]
[  113.231375] [     T65]  nvmet_auth_insert_psk+0x2fb/0x680 [nvmet]
[  113.232336] [     T65]  nvmet_execute_auth_send+0x2f92/0x3380 [nvmet]
[  113.233325] [     T65]  nvmet_tcp_io_work+0x19d1/0x2970 [nvmet_tcp]
[  113.234294] [     T65]  process_one_work+0x85a/0x1460
[  113.235182] [     T65]  worker_thread+0x5e2/0xfc0
[  113.236194] [     T65]  kthread+0x2d1/0x3a0
[  113.237011] [     T65]  ret_from_fork+0x30/0x70
[  113.237877] [     T65]  ret_from_fork_asm+0x1a/0x30

[  113.239492] [     T65] The buggy address belongs to the object at ffff88812effde80
                           which belongs to the cache kmalloc-96 of size 96
[  113.241571] [     T65] The buggy address is located 0 bytes to the right of
                           allocated 67-byte region [ffff88812effde80, ffff88812effdec3)

[  113.244454] [     T65] The buggy address belongs to the physical page:
[  113.245464] [     T65] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12effd
[  113.246691] [     T65] ksm flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  113.247846] [     T65] page_type: f5(slab)
[  113.248740] [     T65] raw: 0017ffffc0000000 ffff888100042280 ffffea0004d9d8c0 0000000000000007
[  113.249964] [     T65] raw: 0000000000000000 0000000080200020 00000001f5000000 0000000000000000
[  113.251188] [     T65] page dumped because: kasan: bad access detected

[  113.252996] [     T65] Memory state around the buggy address:
[  113.254004] [     T65]  ffff88812effdd80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  113.255205] [     T65]  ffff88812effde00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.256376] [     T65] >ffff88812effde80: 00 00 00 00 00 00 00 00 03 fc fc fc fc fc fc fc
[  113.257538] [     T65]                                            ^
[  113.258574] [     T65]  ffff88812effdf00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.259775] [     T65]  ffff88812effdf80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.260954] [     T65] ==================================================================
[  113.262217] [     T65] Disabling lock debugging due to kernel taint
[  113.264174] [    T113] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[  113.265694] [   T1477] nvme nvme1: qid 0: authenticated
[  113.268242] [   T1477] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.

@kawasaki
Copy link
Collaborator

As to the tlshd service existence check, I quickly created a patch which introduces a helper function. Using this, nvme/059 can do the check like this. If you think its useful, feel free to pick them up.

Copy link
Collaborator

@kawasaki kawasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three lines need straight-forward fixes.

@kawasaki
Copy link
Collaborator

kawasaki commented Feb 4, 2025

I tried the updated branch hreinecke/tls.v3 branch, using v6.14-rc1 kernel and "[PATCHv14 00/10] nvme: implement secure concatenation" series. Still I observed the nvme/059 failure and the "BUG: KASAN: slab-out-of-bounds" at nvme/060.

@kawasaki
Copy link
Collaborator

kawasaki commented Feb 18, 2025

@hreinecke I took a closer look, and succeeded to make nvme/059 and nvme/060 on my test node. I fell in two pit-falls: 1) I did not use the latest tlshd and ktls-utils, and 2) I used v6.14-rcX kernels. I used linux-block/block-6.14 branch kernel with your patch series ""[PATCHv14 00/10] nvme: implement secure concatenation", then I managed to make the test cases pass.

To make the nvme/060 pass, I needed several fixes in nvme/060. Please refer to this commit.

I also created some test cases for blktests common scripts to,

  • ensure tlshd version
  • ensure systemctl tlshd service availability
  • not change systemctl tlshd status after running the test cases

My patches are available here. Please take a looks and see if it makes sense.

If you like it, I can post the whole series to linux-nvme and linux-block lists for wider review. Or, I can post my changes separately, and after they got settled to the master branch, then you can rebase this series later on. Please let me know your preference on the next action.

@kawasaki
Copy link
Collaborator

FYI, I ran nvme/059 with my blktests patch on top of your blktests patches, using the kernel v6.14-rc3 and the patch "[PATCHv14 00/10] nvme: implement secure concatenation". Then I got the kernel message below, the the test hanged.

[   65.770111] [   T1024] run blktests nvme/059 at 2025-02-18 21:06:51
[   65.933728] [   T1109] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   65.950772] [   T1110] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[   65.959426] [   T1113] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   66.033776] [     T65] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[   66.038205] [   T1120] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[   66.040590] [   T1120] nvme nvme1: creating 4 I/O queues.
[   66.044446] [   T1120] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   66.049207] [   T1120] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[   66.183920] [   T1140] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   66.496706] [     T65] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x105ee8
[   66.498192] [     T65] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   66.498919] [     T65] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[   66.499663] [     T65] page_type: f5(slab)
[   66.500034] [     T65] raw: 0017ffffc0000040 ffff888100042dc0 dead000000000100 dead000000000122
[   66.500857] [     T65] raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
[   66.501585] [     T65] head: 0017ffffc0000040 ffff888100042dc0 dead000000000100 dead000000000122
[   66.502350] [     T65] head: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
[   66.503176] [     T65] head: 0017ffffc0000003 ffffea000417ba01 ffffffffffffffff 0000000000000000
[   66.503935] [     T65] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[   66.504683] [     T65] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[   66.505537] [     T65] ------------[ cut here ]------------
[   66.506003] [     T65] kernel BUG at ./include/linux/mm.h:1455!
[   66.506484] [     T65] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[   66.507183] [     T65] CPU: 2 UID: 0 PID: 65 Comm: kworker/2:1H Not tainted 6.14.0-rc3+ #268
[   66.507866] [     T65] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[   66.508596] [     T65] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[   66.509101] [     T65] RIP: 0010:__iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.509688] [     T65] Code: 06 00 0f 85 fd 06 00 00 49 8b 47 48 48 8d 70 ff a8 01 4c 0f 45 fe e9 c3 fe ff ff 48 c7 c6 20 c2 3c 8c 4c 89 f7 e8 56 27 55 ff <0f> 0b 4c 8d 70 ff e9 fb f7 ff ff 0f 0b 48 85 c0 0f 85 9e f5 ff ff
[   66.511390] [     T65] RSP: 0018:ffff8881021b71e8 EFLAGS: 00010282
[   66.511950] [     T65] RAX: 000000000000005c RBX: ffff8881021b7468 RCX: 0000000000000000
[   66.512667] [     T65] RDX: 000000000000005c RSI: ffffffff8c3d3bc0 RDI: ffffed1020436e2c
[   66.513356] [     T65] RBP: ffffea000417bb80 R08: 0000000000000001 R09: ffffed1020436de8
[   66.514054] [     T65] R10: ffff8881021b6f47 R11: 0000000000000001 R12: ffff8881021b7460
[   66.514773] [     T65] R13: dffffc0000000000 R14: ffffea000417ba00 R15: ffffea000417ba34
[   66.515457] [     T65] FS:  0000000000000000(0000) GS:ffff8883ae100000(0000) knlGS:0000000000000000
[   66.516239] [     T65] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.516830] [     T65] CR2: 00007f311f1a0148 CR3: 000000012fe40000 CR4: 00000000000006f0
[   66.517550] [     T65] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.519309] [     T65] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   66.521080] [     T65] Call Trace:
[   66.522325] [     T65]  <TASK>
[   66.523552] [     T65]  ? __die_body.cold+0x19/0x27
[   66.524935] [     T65]  ? die+0x2a/0x50
[   66.526258] [     T65]  ? do_trap+0x1e6/0x2d0
[   66.527614] [     T65]  ? __iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.529112] [     T65]  ? do_error_trap+0xa3/0x160
[   66.530415] [     T65]  ? __iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.532026] [     T65]  ? handle_invalid_op+0x2c/0x40
[   66.533289] [     T65]  ? __iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.534671] [     T65]  ? exc_invalid_op+0x29/0x40
[   66.535936] [     T65]  ? asm_exc_invalid_op+0x16/0x20
[   66.537149] [     T65]  ? __iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.538463] [     T65]  ? __iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.539746] [     T65]  ? __pfx_mark_lock+0x10/0x10
[   66.540950] [     T65]  ? __pfx___iov_iter_get_pages_alloc+0x10/0x10
[   66.542194] [     T65]  ? __pfx_check_noncircular+0x10/0x10
[   66.543383] [     T65]  iov_iter_get_pages2+0x68/0xa0
[   66.544546] [     T65]  ? __pfx_iov_iter_get_pages2+0x10/0x10
[   66.545707] [     T65]  sk_msg_zerocopy_from_iter+0x1ae/0x7b0
[   66.546847] [     T65]  ? mark_lock+0xf5/0x1650
[   66.547884] [     T65]  ? __mutex_lock+0x44c/0x1360
[   66.548937] [     T65]  ? __pfx_sk_msg_zerocopy_from_iter+0x10/0x10
[   66.550078] [     T65]  ? lock_acquire+0x1b1/0x540
[   66.551118] [     T65]  ? lock_acquire+0x1c1/0x540
[   66.552148] [     T65]  ? sk_msg_alloc+0xeb/0xae0
[   66.553164] [     T65]  ? mark_held_locks+0x94/0xe0
[   66.554187] [     T65]  ? __local_bh_enable_ip+0xab/0x140
[   66.555275] [     T65]  tls_sw_sendmsg+0xd30/0x24e0 [tls]
[   66.556361] [     T65]  ? sock_has_perm+0x21b/0x2b0
[   66.557393] [     T65]  ? __pfx_tls_sw_sendmsg+0x10/0x10 [tls]
[   66.558499] [     T65]  ? __pfx_tls_sw_sendmsg+0x10/0x10 [tls]
[   66.559579] [     T65]  ? mark_lock+0xf5/0x1650
[   66.560527] [     T65]  sock_sendmsg+0x2f9/0x410
[   66.561484] [     T65]  ? __pfx_sock_sendmsg+0x10/0x10
[   66.562476] [     T65]  ? __pfx_sock_sendmsg+0x10/0x10
[   66.563446] [     T65]  nvme_tcp_try_send_data+0x417/0xe20 [nvme_tcp]
[   66.564505] [     T65]  ? __pfx_nvme_tcp_try_send_data+0x10/0x10 [nvme_tcp]
[   66.565571] [     T65]  ? __pfx___lock_acquire+0x10/0x10
[   66.566484] [     T65]  nvme_tcp_try_send+0x54a/0x9a0 [nvme_tcp]
[   66.567474] [     T65]  ? mutex_trylock+0x16d/0x310
[   66.568366] [     T65]  ? nvme_tcp_io_work+0xfa/0x1e0 [nvme_tcp]
[   66.569355] [     T65]  ? __pfx_nvme_tcp_try_send+0x10/0x10 [nvme_tcp]
[   66.570377] [     T65]  ? __pfx_mutex_trylock+0x10/0x10
[   66.571305] [     T65]  nvme_tcp_io_work+0x106/0x1e0 [nvme_tcp]
[   66.572283] [     T65]  ? __pfx_nvme_tcp_io_work+0x10/0x10 [nvme_tcp]
[   66.573291] [     T65]  process_one_work+0x85a/0x1460
[   66.574200] [     T65]  ? __pfx_lock_acquire+0x10/0x10
[   66.575102] [     T65]  ? __pfx_process_one_work+0x10/0x10
[   66.576011] [     T65]  ? assign_work+0x16c/0x240
[   66.576851] [     T65]  ? lock_is_held_type+0xd5/0x130
[   66.577721] [     T65]  worker_thread+0x5e2/0xfc0
[   66.578572] [     T65]  ? __kthread_parkme+0xb1/0x1d0
[   66.579444] [     T65]  ? __pfx_worker_thread+0x10/0x10
[   66.580351] [     T65]  ? __pfx_worker_thread+0x10/0x10
[   66.581260] [     T65]  kthread+0x39d/0x750
[   66.582061] [     T65]  ? __pfx_kthread+0x10/0x10
[   66.582891] [     T65]  ? __pfx_kthread+0x10/0x10
[   66.583709] [     T65]  ? __pfx_kthread+0x10/0x10
[   66.584520] [     T65]  ret_from_fork+0x30/0x70
[   66.585337] [     T65]  ? __pfx_kthread+0x10/0x10
[   66.586176] [     T65]  ret_from_fork_asm+0x1a/0x30
[   66.587000] [     T65]  </TASK>
[   66.587692] [     T65] Modules linked in: tls nvmet_tcp nvmet nvme_tcp nvme_fabrics nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc ppdev 9pnet_virtio 9pnet pcspkr netfs i2c_piix4 e1000 i2c_smbus parport_pc parport fuse loop dm_multipath nfnetlink zram bochs drm_client_lib drm_shmem_helper drm_kms_helper xfs nvme nvme_core sym53c8xx drm scsi_transport_spi nvme_keyring nvme_auth floppy serio_raw ata_generic pata_acpi qemu_fw_cfg
[   66.593694] [     T65] ---[ end trace 0000000000000000 ]---
[   66.594801] [     T65] RIP: 0010:__iov_iter_get_pages_alloc+0x158a/0x1ef0
[   66.595927] [     T65] Code: 06 00 0f 85 fd 06 00 00 49 8b 47 48 48 8d 70 ff a8 01 4c 0f 45 fe e9 c3 fe ff ff 48 c7 c6 20 c2 3c 8c 4c 89 f7 e8 56 27 55 ff <0f> 0b 4c 8d 70 ff e9 fb f7 ff ff 0f 0b 48 85 c0 0f 85 9e f5 ff ff
[   66.598710] [     T65] RSP: 0018:ffff8881021b71e8 EFLAGS: 00010282
[   66.599864] [     T65] RAX: 000000000000005c RBX: ffff8881021b7468 RCX: 0000000000000000
[   66.601143] [     T65] RDX: 000000000000005c RSI: ffffffff8c3d3bc0 RDI: ffffed1020436e2c
[   66.602465] [     T65] RBP: ffffea000417bb80 R08: 0000000000000001 R09: ffffed1020436de8
[   66.603792] [     T65] R10: ffff8881021b6f47 R11: 0000000000000001 R12: ffff8881021b7460
[   66.605093] [     T65] R13: dffffc0000000000 R14: ffffea000417ba00 R15: ffffea000417ba34
[   66.606430] [     T65] FS:  0000000000000000(0000) GS:ffff8883ae100000(0000) knlGS:0000000000000000
[   66.607846] [     T65] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   66.609067] [     T65] CR2: 00007f311f1a0148 CR3: 000000012fe40000 CR4: 00000000000006f0
[   66.610429] [     T65] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   66.611794] [     T65] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  128.846435] [    T207] nvme nvme1: I/O tag 0 (0000) type 4 opcode 0x7f (Fabrics Cmd) QID 0 timeout
[  128.854242] [   T1147] nvme nvme1: Connect command failed, error wo/DNR bit: 881


@hreinecke
Copy link
Contributor Author

[ 66.038205] [ T1120] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.

Maybe you should follow that advise?

@hreinecke
Copy link
Contributor Author

But yeah, it clearly shouldn't crash. I'll check.

@hreinecke
Copy link
Contributor Author

But this looks again like the infamous page reference issue. If you have compound pages the page reference is only set on the first page, not the following ones. So if you then walk the pages one-by-one, and release the page reference on each page you see this error.

@hreinecke
Copy link
Contributor Author

For reference:

But this looks again like the infamous page reference issue. If you have compound pages the page reference is only set on the first page, not the following ones. So if you then walk the pages one-by-one, and release the page reference on each page you see this error.

And this should have been prevented by the call to 'sendpage_ok()' in nvme_tcp_try_send_data(). Hmm.

To start TLS-encrypted connections.

Signed-off-by: Hannes Reinecke <[email protected]>
Add --tls option to _create_nvmet_subsystem and allow to specify
the tls requirements in _create_nvmet_port.

Signed-off-by: Hannes Reinecke <[email protected]>
To check that the test system has a specific systemctl unit, introduce
the new helper function _have_systemctl_unit.

Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
TCP connections can be encrypted using in-kernel TLS, so add a
testcase to exercise the various combinations.

Signed-off-by: Hannes Reinecke <[email protected]>
To start secure concatenation the option '--concat' has to be passed
to the 'nvme connect' command.

Signed-off-by: Hannes Reinecke <[email protected]>
NVMe-TCP has a 'secure concatenation' mode, where the TLS PSK is
generated from the secret negotiated by the DH-HMAC-CHAP authentication,
and the TLS connection is started after authentication.

Signed-off-by: Hannes Reinecke <[email protected]>
@hreinecke
Copy link
Contributor Author

For reference:

But this looks again like the infamous page reference issue. If you have compound pages the page reference is only set on the first page, not the following ones. So if you then walk the pages one-by-one, and release the page reference on each page you see this error.

And this should have been prevented by the call to 'sendpage_ok()' in nvme_tcp_try_send_data(). Hmm.

Turns out there's an issue with 6.14 (cf my post to linux-nvme just now). Doesn't seem to be related to the concat patches (as it's happening with the stock kernel, too), but you never know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants