selftests/bpf: define string const as global for test_sysctl_prog.c #43

kernel-patches-bot · 2020-09-10T22:15:10Z

Pull request for series with
subject: selftests/bpf: define string const as global for test_sysctl_prog.c
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=200972

kernel-patches-bot · 2020-09-10T22:15:11Z

Master branch: 2f7de98
series: https://patchwork.ozlabs.org/project/netdev/list/?series=200972
version: 1

patch https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ applied successfully

kernel-patches-bot · 2020-09-11T00:37:02Z

Master branch: e3b9626
series: https://patchwork.ozlabs.org/project/netdev/list/?series=200972
version: 1

patch https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ applied successfully

with the following error: libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Error: failed to open BPF object file: Relocation failed make: *** [/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h] Error 255 make: *** Deleting file `/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h' The local string constant "tcp_mem_name" is put into '.rodata.str1.1' section which libbpf cannot handle. Using untweaked upstream llvm, "tcp_mem_name" is completely inlined after loop unrolling. Commit 7fb5eef ("selftests/bpf: Fix test_sysctl_loop{1, 2} failure due to clang change") solved a similar problem by defining the string const as a global. Let us do the same here for test_sysctl_prog.c so it can weather future potential llvm changes. Signed-off-by: Yonghong Song <[email protected]> --- tools/testing/selftests/bpf/progs/test_sysctl_prog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

kernel-patches-bot · 2020-09-11T01:07:51Z

Master branch: d66423f
series: https://patchwork.ozlabs.org/project/netdev/list/?series=200972
version: 1

patch https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ applied successfully

kernel-patches-bot · 2020-09-11T03:04:20Z

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=200972 irrelevant now. Closing PR.

This reverts commit 568262b. The commit causes the following panic when shutting down a rockpro64-v2 board: [..] [ 41.684569] xhci-hcd xhci-hcd.2.auto: USB bus 1 deregistered [ 41.686301] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [ 41.687096] Mem abort info: [ 41.687345] ESR = 0x96000004 [ 41.687615] EC = 0x25: DABT (current EL), IL = 32 bits [ 41.688082] SET = 0, FnV = 0 [ 41.688352] EA = 0, S1PTW = 0 [ 41.688628] Data abort info: [ 41.688882] ISV = 0, ISS = 0x00000004 [ 41.689219] CM = 0, WnR = 0 [ 41.689481] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000073b2000 [ 41.690046] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000 [ 41.690654] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 41.691143] Modules linked in: [ 41.691416] CPU: 5 PID: 1 Comm: shutdown Not tainted 5.13.0-rc4 #43 [ 41.691966] Hardware name: Pine64 RockPro64 v2.0 (DT) [ 41.692409] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 41.692937] pc : down_read_interruptible+0xec/0x200 [ 41.693373] lr : simple_recursive_removal+0x48/0x280 [ 41.693815] sp : ffff800011fab910 [ 41.694107] x29: ffff800011fab910 x28: ffff0000008fe480 x27: ffff0000008fe4d8 [ 41.694736] x26: ffff800011529a90 x25: 00000000000000a0 x24: ffff800011edd030 [ 41.695364] x23: 0000000000000080 x22: 0000000000000000 x21: ffff800011f23994 [ 41.695992] x20: ffff800011f23998 x19: ffff0000008fe480 x18: ffffffffffffffff [ 41.696620] x17: 000c0400bb44ffff x16: 0000000000000009 x15: ffff800091faba3d [ 41.697248] x14: 0000000000000004 x13: 0000000000000000 x12: 0000000000000020 [ 41.697875] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f x9 : 6f6c746364716e62 [ 41.698502] x8 : 7f7f7f7f7f7f7f7f x7 : fefefeff6364626d x6 : 0000000000000440 [ 41.699130] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000000000a0 [ 41.699758] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 00000000000000a0 [ 41.700386] Call trace: [ 41.700602] down_read_interruptible+0xec/0x200 [ 41.701003] debugfs_remove+0x5c/0x80 [ 41.701328] dwc3_debugfs_exit+0x1c/0x6c [ 41.701676] dwc3_remove+0x34/0x1a0 [ 41.701988] platform_remove+0x28/0x60 [ 41.702322] __device_release_driver+0x188/0x22c [ 41.702730] device_release_driver+0x2c/0x44 [ 41.703106] bus_remove_device+0x124/0x130 [ 41.703468] device_del+0x16c/0x424 [ 41.703777] platform_device_del.part.0+0x1c/0x90 [ 41.704193] platform_device_unregister+0x28/0x44 [ 41.704608] of_platform_device_destroy+0xe8/0x100 [ 41.705031] device_for_each_child_reverse+0x64/0xb4 [ 41.705470] of_platform_depopulate+0x40/0x84 [ 41.705853] __dwc3_of_simple_teardown+0x20/0xd4 [ 41.706260] dwc3_of_simple_shutdown+0x14/0x20 [ 41.706652] platform_shutdown+0x28/0x40 [ 41.706998] device_shutdown+0x158/0x330 [ 41.707344] kernel_power_off+0x38/0x7c [ 41.707684] __do_sys_reboot+0x16c/0x2a0 [ 41.708029] __arm64_sys_reboot+0x28/0x34 [ 41.708383] invoke_syscall+0x48/0x114 [ 41.708716] el0_svc_common.constprop.0+0x44/0xdc [ 41.709131] do_el0_svc+0x28/0x90 [ 41.709426] el0_svc+0x2c/0x54 [ 41.709698] el0_sync_handler+0xa4/0x130 [ 41.710045] el0_sync+0x198/0x1c0 [ 41.710342] Code: c8047c62 35ffff84 17fffe5f f9800071 (c85ffc60) [ 41.710881] ---[ end trace 406377df5178f75c ]--- [ 41.711299] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 41.712084] Kernel Offset: disabled [ 41.712391] CPU features: 0x10001031,20000846 [ 41.712775] Memory Limit: none [ 41.713049] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- As Felipe explained: "dwc3_shutdown() is just called dwc3_remove() directly, then we end up calling debugfs_remove_recursive() twice." Reverting the commit fixes the panic. Fixes: 568262b ("usb: dwc3: core: Add shutdown callback for dwc3") Acked-by: Felipe Balbi <[email protected]> Signed-off-by: Alexandru Elisei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

[BUG] When running fsstress with subpage RW support, there are random BUG_ON()s triggered with the following trace: kernel BUG at fs/btrfs/file-item.c:667! Internal error: Oops - BUG: 0 [#1] SMP CPU: 1 PID: 3486 Comm: kworker/u13:2 5.11.0-rc4-custom+ #43 Hardware name: Radxa ROCK Pi 4B (DT) Workqueue: btrfs-worker-high btrfs_work_helper [btrfs] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : btrfs_csum_one_bio+0x420/0x4e0 [btrfs] lr : btrfs_csum_one_bio+0x400/0x4e0 [btrfs] Call trace: btrfs_csum_one_bio+0x420/0x4e0 [btrfs] btrfs_submit_bio_start+0x20/0x30 [btrfs] run_one_async_start+0x28/0x44 [btrfs] btrfs_work_helper+0x128/0x1b4 [btrfs] process_one_work+0x22c/0x430 worker_thread+0x70/0x3a0 kthread+0x13c/0x140 ret_from_fork+0x10/0x30 [CAUSE] Above BUG_ON() means there is some bio range which doesn't have ordered extent, which indeed is worth a BUG_ON(). Unlike regular sectorsize == PAGE_SIZE case, in subpage we have extra subpage dirty bitmap to record which range is dirty and should be written back. This means, if we submit bio for a subpage range, we do not only need to clear page dirty, but also need to clear subpage dirty bits. In __extent_writepage_io(), we will call btrfs_page_clear_dirty() for any range we submit a bio. But there is loophole, if we hit a range which is beyond i_size, we just call btrfs_writepage_endio_finish_ordered() to finish the ordered io, then break out, without clearing the subpage dirty. This means, if we hit above branch, the subpage dirty bits are still there, if other range of the page get dirtied and we need to writeback that page again, we will submit bio for the old range, leaving a wild bio range which doesn't have ordered extent. [FIX] Fix it by always calling btrfs_page_clear_dirty() in __extent_writepage_io(). Also to avoid such problem from happening again, add a new assert, btrfs_page_assert_not_dirty(), to make sure both page dirty and subpage dirty bits are cleared before exiting __extent_writepage_io(). Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

Since f35a2a9 ("drm/encoder: make encoder control functions optional") drm_mode_config_validate would print warnings if both cursor plane and cursor functions are provided. Restore separate set of drm_crtc_funcs to be used if separate cursor plane is provided. [ 6.556046] ------------[ cut here ]------------ [ 6.556071] [CRTC:93:crtc-0] must not have both a cursor plane and a cursor_set func [ 6.556091] WARNING: CPU: 1 PID: 76 at drivers/gpu/drm/drm_mode_config.c:648 drm_mode_config_validate+0x238/0x4d0 [ 6.567453] Modules linked in: [ 6.577604] CPU: 1 PID: 76 Comm: kworker/u8:2 Not tainted 5.15.0-rc1-dirty #43 [ 6.580557] Hardware name: Qualcomm Technologies, Inc. DB820c (DT) [ 6.587763] Workqueue: events_unbound deferred_probe_work_func [ 6.593926] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6.599740] pc : drm_mode_config_validate+0x238/0x4d0 [ 6.606596] lr : drm_mode_config_validate+0x238/0x4d0 [ 6.611804] sp : ffff8000121b3980 [ 6.616838] x29: ffff8000121b3990 x28: 0000000000000000 x27: 0000000000000001 [ 6.620140] x26: ffff8000114cde50 x25: ffff8000114cdd40 x24: ffff0000987282d8 [ 6.627258] x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000001 [ 6.634376] x20: ffff000098728000 x19: ffff000080a39000 x18: ffffffffffffffff [ 6.641494] x17: 3136564e3631564e x16: 0000000000000324 x15: ffff800011c78709 [ 6.648613] x14: 0000000000000000 x13: ffff800011a22850 x12: 00000000000009ab [ 6.655730] x11: 0000000000000339 x10: ffff800011a22850 x9 : ffff800011a22850 [ 6.662848] x8 : 00000000ffffefff x7 : ffff800011a7a850 x6 : ffff800011a7a850 [ 6.669966] x5 : 000000000000bff4 x4 : 40000000fffff339 x3 : 0000000000000000 [ 6.677084] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00008093b800 [ 6.684205] Call trace: [ 6.691319] drm_mode_config_validate+0x238/0x4d0 [ 6.693577] drm_dev_register+0x17c/0x210 [ 6.698435] msm_drm_bind+0x4b4/0x694 [ 6.702429] try_to_bring_up_master+0x164/0x1d0 [ 6.706075] __component_add+0xa0/0x170 [ 6.710415] component_add+0x14/0x20 [ 6.714234] msm_hdmi_dev_probe+0x1c/0x2c [ 6.718053] platform_probe+0x68/0xe0 [ 6.721959] really_probe.part.0+0x9c/0x30c [ 6.725606] __driver_probe_device+0x98/0x144 [ 6.729600] driver_probe_device+0xc8/0x15c [ 6.734114] __device_attach_driver+0xb4/0x120 [ 6.738106] bus_for_each_drv+0x78/0xd0 [ 6.742619] __device_attach+0xdc/0x184 [ 6.746351] device_initial_probe+0x14/0x20 [ 6.750172] bus_probe_device+0x9c/0xa4 [ 6.754337] deferred_probe_work_func+0x88/0xc0 [ 6.758158] process_one_work+0x1d0/0x370 [ 6.762671] worker_thread+0x2c8/0x470 [ 6.766839] kthread+0x15c/0x170 [ 6.770483] ret_from_fork+0x10/0x20 [ 6.773870] ---[ end trace 5884eb76cd26d274 ]--- [ 6.777500] ------------[ cut here ]------------ [ 6.782043] [CRTC:93:crtc-0] must not have both a cursor plane and a cursor_move func [ 6.782063] WARNING: CPU: 1 PID: 76 at drivers/gpu/drm/drm_mode_config.c:654 drm_mode_config_validate+0x290/0x4d0 [ 6.794362] Modules linked in: [ 6.804600] CPU: 1 PID: 76 Comm: kworker/u8:2 Tainted: G W 5.15.0-rc1-dirty #43 [ 6.807555] Hardware name: Qualcomm Technologies, Inc. DB820c (DT) [ 6.816148] Workqueue: events_unbound deferred_probe_work_func [ 6.822311] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6.828126] pc : drm_mode_config_validate+0x290/0x4d0 [ 6.834981] lr : drm_mode_config_validate+0x290/0x4d0 [ 6.840189] sp : ffff8000121b3980 [ 6.845223] x29: ffff8000121b3990 x28: 0000000000000000 x27: 0000000000000001 [ 6.848525] x26: ffff8000114cde50 x25: ffff8000114cdd40 x24: ffff0000987282d8 [ 6.855643] x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000001 [ 6.862763] x20: ffff000098728000 x19: ffff000080a39000 x18: ffffffffffffffff [ 6.869879] x17: 3136564e3631564e x16: 0000000000000324 x15: ffff800011c790c2 [ 6.876998] x14: 0000000000000000 x13: ffff800011a22850 x12: 0000000000000a2f [ 6.884116] x11: 0000000000000365 x10: ffff800011a22850 x9 : ffff800011a22850 [ 6.891234] x8 : 00000000ffffefff x7 : ffff800011a7a850 x6 : ffff800011a7a850 [ 6.898351] x5 : 000000000000bff4 x4 : 40000000fffff365 x3 : 0000000000000000 [ 6.905470] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00008093b800 [ 6.912590] Call trace: [ 6.919702] drm_mode_config_validate+0x290/0x4d0 [ 6.921960] drm_dev_register+0x17c/0x210 [ 6.926821] msm_drm_bind+0x4b4/0x694 [ 6.930813] try_to_bring_up_master+0x164/0x1d0 [ 6.934459] __component_add+0xa0/0x170 [ 6.938799] component_add+0x14/0x20 [ 6.942619] msm_hdmi_dev_probe+0x1c/0x2c [ 6.946438] platform_probe+0x68/0xe0 [ 6.950345] really_probe.part.0+0x9c/0x30c [ 6.953991] __driver_probe_device+0x98/0x144 [ 6.957984] driver_probe_device+0xc8/0x15c [ 6.962498] __device_attach_driver+0xb4/0x120 [ 6.966492] bus_for_each_drv+0x78/0xd0 [ 6.971004] __device_attach+0xdc/0x184 [ 6.974737] device_initial_probe+0x14/0x20 [ 6.978556] bus_probe_device+0x9c/0xa4 [ 6.982722] deferred_probe_work_func+0x88/0xc0 [ 6.986543] process_one_work+0x1d0/0x370 [ 6.991057] worker_thread+0x2c8/0x470 [ 6.995223] kthread+0x15c/0x170 [ 6.998869] ret_from_fork+0x10/0x20 [ 7.002255] ---[ end trace 5884eb76cd26d275 ]--- Fixes: aa649e8 ("drm/msm/mdp5: mdp5_crtc: Restore cursor state only if LM cursors are enabled") Signed-off-by: Dmitry Baryshkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Clark <[email protected]>

Some function calls are not implemented in rxrpc_no_security, there are preparse_server_key, free_preparse_server_key and destroy_server_key. When rxrpc security type is rxrpc_no_security, user can easily trigger a null-ptr-deref bug via ioctl. So judgment should be added to prevent it The crash log: user@syzkaller:~$ ./rxrpc_preparse_s [ 37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 37.957645][T15626] #PF: supervisor instruction fetch in kernel mode [ 37.958229][T15626] #PF: error_code(0x0010) - not-present page [ 37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0 [ 37.959321][T15626] Oops: 0010 [#1] PREEMPT SMP [ 37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d #43 [ 37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 37.961474][T15626] RIP: 0010:0x0 [ 37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286 [ 37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000 [ 37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48 [ 37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002 [ 37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680 [ 37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80 [ 37.966441][T15626] FS: 00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 37.966979][T15626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0 [ 37.967864][T15626] Call Trace: [ 37.968062][T15626] <TASK> [ 37.968240][T15626] rxrpc_preparse_s+0x59/0x90 [ 37.968541][T15626] key_create_or_update+0x174/0x510 [ 37.968863][T15626] __x64_sys_add_key+0x139/0x1d0 [ 37.969165][T15626] do_syscall_64+0x35/0xb0 [ 37.969451][T15626] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 37.969824][T15626] RIP: 0033:0x43a1f9 Signed-off-by: Xiaolong Huang <[email protected]> Tested-by: Xiaolong Huang <[email protected]> Signed-off-by: David Howells <[email protected]> Acked-by: Marc Dionne <[email protected]> cc: [email protected] Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html Fixes: 12da59f ("rxrpc: Hand server key parsing off to the security class") Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <[email protected]>

Under CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_DEBUG_PREEMPT=y, we can see the following messages on LoongArch, this is because using might_sleep() in preemption disable context. [ 0.001127] smp: Bringing up secondary CPUs ... [ 0.001222] Booting CPU#1... [ 0.001244] 64-bit Loongson Processor probed (LA464 Core) [ 0.001247] CPU1 revision is: 0014c012 (Loongson-64bit) [ 0.001250] FPU1 revision is: 00000000 [ 0.001252] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 0.001255] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ 0.001257] preempt_count: 1, expected: 0 [ 0.001258] RCU nest depth: 0, expected: 0 [ 0.001259] Preemption disabled at: [ 0.001261] [<9000000000223800>] arch_dup_task_struct+0x20/0x110 [ 0.001272] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.2.0-rc7+ #43 [ 0.001275] Hardware name: Loongson Loongson-3A5000-7A1000-1w-A2101/Loongson-LS3A5000-7A1000-1w-A2101, BIOS vUDK2018-LoongArch-V4.0.05132-beta10 12/13/202 [ 0.001277] Stack : 0072617764726148 0000000000000000 9000000000222f1c 90000001001e0000 [ 0.001286] 90000001001e3be0 90000001001e3be8 0000000000000000 0000000000000000 [ 0.001292] 90000001001e3be8 0000000000000040 90000001001e3cb8 90000001001e3a50 [ 0.001297] 9000000001642000 90000001001e3be8 be694d10ce4139dd 9000000100174500 [ 0.001303] 0000000000000001 0000000000000001 00000000ffffe0a2 0000000000000020 [ 0.001309] 000000000000002f 9000000001354116 00000000056b0000 ffffffffffffffff [ 0.001314] 0000000000000000 0000000000000000 90000000014f6e90 9000000001642000 [ 0.001320] 900000000022b69c 0000000000000001 0000000000000000 9000000001736a90 [ 0.001325] 9000000100038000 0000000000000000 9000000000222f34 0000000000000000 [ 0.001331] 00000000000000b0 0000000000000004 0000000000000000 0000000000070000 [ 0.001337] ... [ 0.001339] Call Trace: [ 0.001342] [<9000000000222f34>] show_stack+0x5c/0x180 [ 0.001346] [<90000000010bdd80>] dump_stack_lvl+0x60/0x88 [ 0.001352] [<9000000000266418>] __might_resched+0x180/0x1cc [ 0.001356] [<90000000010c742c>] mutex_lock+0x20/0x64 [ 0.001359] [<90000000002a8ccc>] irq_find_matching_fwspec+0x48/0x124 [ 0.001364] [<90000000002259c4>] constant_clockevent_init+0x68/0x204 [ 0.001368] [<900000000022acf4>] start_secondary+0x40/0xa8 [ 0.001371] [<90000000010c0124>] smpboot_entry+0x60/0x64 Here are the complete call chains: smpboot_entry() start_secondary() constant_clockevent_init() get_timer_irq() irq_find_matching_fwnode() irq_find_matching_fwspec() mutex_lock() might_sleep() __might_sleep() __might_resched() In order to avoid the above issue, we should break the call chains, using timer_irq_installed variable as check condition to only call get_timer_irq() once in constant_clockevent_init() is a simple and proper way. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>

When CONFIG_FRAME_POINTER is unset, the stack unwinding function walk_stackframe randomly reads the stack and then, when KASAN is enabled, it can lead to the following backtrace: [ 0.000000] ================================================================== [ 0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a [ 0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0 [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] Call Trace: [ 0.000000] [<ffffffff80007ba8>] walk_stackframe+0x0/0x11a [ 0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff80c49c80>] dump_stack_lvl+0x22/0x36 [ 0.000000] [<ffffffff80c3783e>] print_report+0x198/0x4a8 [ 0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff8015f68a>] kasan_report+0x9a/0xc8 [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a [ 0.000000] [<ffffffff8006e99c>] desc_make_final+0x80/0x84 [ 0.000000] [<ffffffff8009a04e>] stack_trace_save+0x88/0xa6 [ 0.000000] [<ffffffff80099fc2>] filter_irq_stacks+0x72/0x76 [ 0.000000] [<ffffffff8006b95e>] devkmsg_read+0x32a/0x32e [ 0.000000] [<ffffffff8015ec16>] kasan_save_stack+0x28/0x52 [ 0.000000] [<ffffffff8006e998>] desc_make_final+0x7c/0x84 [ 0.000000] [<ffffffff8009a04a>] stack_trace_save+0x84/0xa6 [ 0.000000] [<ffffffff8015ec52>] kasan_set_track+0x12/0x20 [ 0.000000] [<ffffffff8015f22e>] __kasan_slab_alloc+0x58/0x5e [ 0.000000] [<ffffffff8015e7ea>] __kmem_cache_create+0x21e/0x39a [ 0.000000] [<ffffffff80e133ac>] create_boot_cache+0x70/0x9c [ 0.000000] [<ffffffff80e17ab2>] kmem_cache_init+0x6c/0x11e [ 0.000000] [<ffffffff80e00fd6>] mm_init+0xd8/0xfe [ 0.000000] [<ffffffff80e011d8>] start_kernel+0x190/0x3ca [ 0.000000] [ 0.000000] The buggy address belongs to stack of task swapper/0 [ 0.000000] and is located at offset 0 in frame: [ 0.000000] stack_trace_save+0x0/0xa6 [ 0.000000] [ 0.000000] This frame has 1 object: [ 0.000000] [32, 56) 'c' [ 0.000000] [ 0.000000] The buggy address belongs to the physical page: [ 0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07 [ 0.000000] flags: 0x1000(reserved|zone=0) [ 0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000 [ 0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff [ 0.000000] page dumped because: kasan: bad access detected [ 0.000000] [ 0.000000] Memory state around the buggy address: [ 0.000000] ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3 [ 0.000000] ^ [ 0.000000] ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] ================================================================== Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise mode. Fixes: 5d8544e ("RISC-V: Generic library routines and assembly") Reported-by: Chathura Rajapaksha <[email protected]> Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/ Suggested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Alexandre Ghiti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>

when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004ee("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2f004ee("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: AngeloGioacchino Del Regno <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Haibo Li <[email protected]> Cc: Matthias Brugger <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

.probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however, if probe() fails after this call, we currently never call sysfs_remove_file_from_group(). (The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one()) does not help, as .remove() is not called on .probe() error.) Thus, if probe() fails after the sysfs_add_file_to_group() call, the next time we insmod the module we will get: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme' CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 kernel-patches#43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_add_file_mode_ns+0x11a/0x130 sysfs_add_file_to_group+0x7e/0xc0 ahci_init_one+0x31f/0xd40 [ahci] Fixes: 894fba7 ("ata: ahci: Add sysfs attribute to show remapped NVMe device count") Cc: [email protected] Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Niklas Cassel <[email protected]>

This fixes a NULL pointer dereference bug due to a data race which looks like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv. At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line: struct cachefiles_object *object = cachefiles_cres_object(cres); The next line crashes, obviously: struct cachefiles_cache *cache = object->volume->cache; During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero. The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE"). This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD. Fixes: 12bb21a ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann <[email protected]> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>

Doing an async decryption (large read) crashes with a slab-use-after-free way down in the crypto API. Reproducer: # mount.cifs -o ...,seal,esize=1 //srv/share /mnt # dd if=/mnt/largefile of=/dev/null ... [ 194.196391] ================================================================== [ 194.196844] BUG: KASAN: slab-use-after-free in gf128mul_4k_lle+0xc1/0x110 [ 194.197269] Read of size 8 at addr ffff888112bd0448 by task kworker/u77:2/899 [ 194.197707] [ 194.197818] CPU: 12 UID: 0 PID: 899 Comm: kworker/u77:2 Not tainted 6.11.0-lku-00028-gfca3ca14a17a-dirty kernel-patches#43 [ 194.198400] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014 [ 194.199046] Workqueue: smb3decryptd smb2_decrypt_offload [cifs] [ 194.200032] Call Trace: [ 194.200191] <TASK> [ 194.200327] dump_stack_lvl+0x4e/0x70 [ 194.200558] ? gf128mul_4k_lle+0xc1/0x110 [ 194.200809] print_report+0x174/0x505 [ 194.201040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 194.201352] ? srso_return_thunk+0x5/0x5f [ 194.201604] ? __virt_addr_valid+0xdf/0x1c0 [ 194.201868] ? gf128mul_4k_lle+0xc1/0x110 [ 194.202128] kasan_report+0xc8/0x150 [ 194.202361] ? gf128mul_4k_lle+0xc1/0x110 [ 194.202616] gf128mul_4k_lle+0xc1/0x110 [ 194.202863] ghash_update+0x184/0x210 [ 194.203103] shash_ahash_update+0x184/0x2a0 [ 194.203377] ? __pfx_shash_ahash_update+0x10/0x10 [ 194.203651] ? srso_return_thunk+0x5/0x5f [ 194.203877] ? crypto_gcm_init_common+0x1ba/0x340 [ 194.204142] gcm_hash_assoc_remain_continue+0x10a/0x140 [ 194.204434] crypt_message+0xec1/0x10a0 [cifs] [ 194.206489] ? __pfx_crypt_message+0x10/0x10 [cifs] [ 194.208507] ? srso_return_thunk+0x5/0x5f [ 194.209205] ? srso_return_thunk+0x5/0x5f [ 194.209925] ? srso_return_thunk+0x5/0x5f [ 194.210443] ? srso_return_thunk+0x5/0x5f [ 194.211037] decrypt_raw_data+0x15f/0x250 [cifs] [ 194.212906] ? __pfx_decrypt_raw_data+0x10/0x10 [cifs] [ 194.214670] ? srso_return_thunk+0x5/0x5f [ 194.215193] smb2_decrypt_offload+0x12a/0x6c0 [cifs] This is because TFM is being used in parallel. Fix this by allocating a new AEAD TFM for async decryption, but keep the existing one for synchronous READ cases (similar to what is done in smb3_calc_signature()). Also remove the calls to aead_request_set_callback() and crypto_wait_req() since it's always going to be a synchronous operation. Signed-off-by: Enzo Matsumiya <[email protected]> Signed-off-by: Steve French <[email protected]>

The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging tosend bytes, which is either msg->sg.size or a smaller value apply_bytes. Potential problems with this strategy are as follows: - If the actual sent bytes are smaller than tosend, we need to charge some bytes back, as in line 487, which is okay but seems not clean. - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may miss uncharging (msg->sg.size - apply_bytes) bytes. 415 tosend = msg->sg.size; 416 if (psock->apply_bytes && psock->apply_bytes < tosend) 417 tosend = psock->apply_bytes; ... 443 sk_msg_return(sk, msg, tosend); 444 release_sock(sk); 446 origsize = msg->sg.size; 447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, 448 msg, tosend, flags); 449 sent = origsize - msg->sg.size; ... 454 lock_sock(sk); 455 if (unlikely(ret < 0)) { 456 int free = sk_msg_free_nocharge(sk, msg); 458 if (!cork) 459 *copied -= free; 460 } ... 487 if (eval == __SK_REDIRECT) 488 sk_mem_charge(sk, tosend - sent); When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply, the following warning will be reported, ------------[ cut here ]------------ WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0 Modules linked in: CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x190/0x1a0 RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206 RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800 RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900 RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0 R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400 R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100 FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x130 ? inet_sock_destruct+0x190/0x1a0 ? report_bug+0xfc/0x1e0 ? handle_bug+0x5c/0xa0 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x190/0x1a0 __sk_destruct+0x25/0x220 sk_psock_destroy+0x2b2/0x310 process_scheduled_works+0xa3/0x3e0 worker_thread+0x117/0x240 ? __pfx_worker_thread+0x10/0x10 kthread+0xcf/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- In __SK_REDIRECT, a more concise way is delaying the uncharging after sent bytes are finalized, and uncharge this value. When (ret < 0), we shall invoke sk_msg_free. Same thing happens in case __SK_DROP, when tosend is set to apply_bytes, we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same warning will be reported in selftest. 468 case __SK_DROP: 469 default: 470 sk_msg_free_partial(sk, msg, tosend); 471 sk_msg_apply_bytes(psock, tosend); 472 *copied -= (tosend + delta); 473 return -EACCES; So instead of sk_msg_free_partial we can do sk_msg_free here. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues") Signed-off-by: Zijian Zhang <[email protected]>

The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging tosend bytes, which is either msg->sg.size or a smaller value apply_bytes. Potential problems with this strategy are as follows: - If the actual sent bytes are smaller than tosend, we need to charge some bytes back, as in line 487, which is okay but seems not clean. - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may miss uncharging (msg->sg.size - apply_bytes) bytes. 415 tosend = msg->sg.size; 416 if (psock->apply_bytes && psock->apply_bytes < tosend) 417 tosend = psock->apply_bytes; ... 443 sk_msg_return(sk, msg, tosend); 444 release_sock(sk); 446 origsize = msg->sg.size; 447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, 448 msg, tosend, flags); 449 sent = origsize - msg->sg.size; ... 454 lock_sock(sk); 455 if (unlikely(ret < 0)) { 456 int free = sk_msg_free_nocharge(sk, msg); 458 if (!cork) 459 *copied -= free; 460 } ... 487 if (eval == __SK_REDIRECT) 488 sk_mem_charge(sk, tosend - sent); When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply, the following warning will be reported, ------------[ cut here ]------------ WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0 Modules linked in: CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x190/0x1a0 RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206 RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800 RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900 RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0 R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400 R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100 FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x130 ? inet_sock_destruct+0x190/0x1a0 ? report_bug+0xfc/0x1e0 ? handle_bug+0x5c/0xa0 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x190/0x1a0 __sk_destruct+0x25/0x220 sk_psock_destroy+0x2b2/0x310 process_scheduled_works+0xa3/0x3e0 worker_thread+0x117/0x240 ? __pfx_worker_thread+0x10/0x10 kthread+0xcf/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- In __SK_REDIRECT, a more concise way is delaying the uncharging after sent bytes are finalized, and uncharge this value. When (ret < 0), we shall invoke sk_msg_free. Same thing happens in case __SK_DROP, when tosend is set to apply_bytes, we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same warning will be reported in selftest. 468 case __SK_DROP: 469 default: 470 sk_msg_free_partial(sk, msg, tosend); 471 sk_msg_apply_bytes(psock, tosend); 472 *copied -= (tosend + delta); 473 return -EACCES; So instead of sk_msg_free_partial we can do sk_msg_free here. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues") Signed-off-by: Zijian Zhang <[email protected]> Acked-by: John Fastabend <[email protected]>

The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging tosend bytes, which is either msg->sg.size or a smaller value apply_bytes. Potential problems with this strategy are as follows: - If the actual sent bytes are smaller than tosend, we need to charge some bytes back, as in line 487, which is okay but seems not clean. - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may miss uncharging (msg->sg.size - apply_bytes) bytes. [...] 415 tosend = msg->sg.size; 416 if (psock->apply_bytes && psock->apply_bytes < tosend) 417 tosend = psock->apply_bytes; [...] 443 sk_msg_return(sk, msg, tosend); 444 release_sock(sk); 446 origsize = msg->sg.size; 447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, 448 msg, tosend, flags); 449 sent = origsize - msg->sg.size; [...] 454 lock_sock(sk); 455 if (unlikely(ret < 0)) { 456 int free = sk_msg_free_nocharge(sk, msg); 458 if (!cork) 459 *copied -= free; 460 } [...] 487 if (eval == __SK_REDIRECT) 488 sk_mem_charge(sk, tosend - sent); [...] When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply, the following warning will be reported: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0 Modules linked in: CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x190/0x1a0 RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206 RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800 RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900 RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0 R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400 R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100 FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x130 ? inet_sock_destruct+0x190/0x1a0 ? report_bug+0xfc/0x1e0 ? handle_bug+0x5c/0xa0 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x190/0x1a0 __sk_destruct+0x25/0x220 sk_psock_destroy+0x2b2/0x310 process_scheduled_works+0xa3/0x3e0 worker_thread+0x117/0x240 ? __pfx_worker_thread+0x10/0x10 kthread+0xcf/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- In __SK_REDIRECT, a more concise way is delaying the uncharging after sent bytes are finalized, and uncharge this value. When (ret < 0), we shall invoke sk_msg_free. Same thing happens in case __SK_DROP, when tosend is set to apply_bytes, we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same warning will be reported in selftest. [...] 468 case __SK_DROP: 469 default: 470 sk_msg_free_partial(sk, msg, tosend); 471 sk_msg_apply_bytes(psock, tosend); 472 *copied -= (tosend + delta); 473 return -EACCES; [...] So instead of sk_msg_free_partial we can do sk_msg_free here. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues") Signed-off-by: Zijian Zhang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]

[ Upstream commit ca70b8b ] The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging tosend bytes, which is either msg->sg.size or a smaller value apply_bytes. Potential problems with this strategy are as follows: - If the actual sent bytes are smaller than tosend, we need to charge some bytes back, as in line 487, which is okay but seems not clean. - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may miss uncharging (msg->sg.size - apply_bytes) bytes. [...] 415 tosend = msg->sg.size; 416 if (psock->apply_bytes && psock->apply_bytes < tosend) 417 tosend = psock->apply_bytes; [...] 443 sk_msg_return(sk, msg, tosend); 444 release_sock(sk); 446 origsize = msg->sg.size; 447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, 448 msg, tosend, flags); 449 sent = origsize - msg->sg.size; [...] 454 lock_sock(sk); 455 if (unlikely(ret < 0)) { 456 int free = sk_msg_free_nocharge(sk, msg); 458 if (!cork) 459 *copied -= free; 460 } [...] 487 if (eval == __SK_REDIRECT) 488 sk_mem_charge(sk, tosend - sent); [...] When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply, the following warning will be reported: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0 Modules linked in: CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ kernel-patches#43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x190/0x1a0 RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206 RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800 RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900 RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0 R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400 R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100 FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x89/0x130 ? inet_sock_destruct+0x190/0x1a0 ? report_bug+0xfc/0x1e0 ? handle_bug+0x5c/0xa0 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x190/0x1a0 __sk_destruct+0x25/0x220 sk_psock_destroy+0x2b2/0x310 process_scheduled_works+0xa3/0x3e0 worker_thread+0x117/0x240 ? __pfx_worker_thread+0x10/0x10 kthread+0xcf/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- In __SK_REDIRECT, a more concise way is delaying the uncharging after sent bytes are finalized, and uncharge this value. When (ret < 0), we shall invoke sk_msg_free. Same thing happens in case __SK_DROP, when tosend is set to apply_bytes, we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same warning will be reported in selftest. [...] 468 case __SK_DROP: 469 default: 470 sk_msg_free_partial(sk, msg, tosend); 471 sk_msg_apply_bytes(psock, tosend); 472 *copied -= (tosend + delta); 473 return -EACCES; [...] So instead of sk_msg_free_partial we can do sk_msg_free here. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues") Signed-off-by: Zijian Zhang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>

kernel-patches-bot added bpf-next new V1 labels Sep 10, 2020

kernel-patches-bot force-pushed the series/200972 branch from f181ce9 to a6d4d9a Compare September 11, 2020 00:37

kernel-patches-bot and others added 2 commits September 10, 2020 18:07

adding ci files

29f421e

kernel-patches-bot force-pushed the series/200972 branch from a6d4d9a to e535f88 Compare September 11, 2020 01:07

kernel-patches-bot added accepted and removed new labels Sep 11, 2020

kernel-patches-bot closed this Sep 11, 2020

kernel-patches-bot deleted the series/200972 branch September 15, 2020 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selftests/bpf: define string const as global for test_sysctl_prog.c #43

selftests/bpf: define string const as global for test_sysctl_prog.c #43

kernel-patches-bot commented Sep 10, 2020

kernel-patches-bot commented Sep 10, 2020

kernel-patches-bot commented Sep 11, 2020

kernel-patches-bot commented Sep 11, 2020

kernel-patches-bot commented Sep 11, 2020

selftests/bpf: define string const as global for test_sysctl_prog.c #43

selftests/bpf: define string const as global for test_sysctl_prog.c #43

Conversation

kernel-patches-bot commented Sep 10, 2020

kernel-patches-bot commented Sep 10, 2020

kernel-patches-bot commented Sep 11, 2020

kernel-patches-bot commented Sep 11, 2020

kernel-patches-bot commented Sep 11, 2020