Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SECO-94 completion part 2 #16

Conversation

bmastbergen
Copy link
Collaborator

@bmastbergen bmastbergen commented Nov 25, 2024

Vulnerabilities and CVEs addressed include:

jira VULN-72
CVE-2021-4204

jira VULN-144
CVE-2022-23222

jira VULN-7854
CVE-2022-48929

build/install/boot log
kernel-build.log

[brett@vuln_72_8 ~]$ uname -a
Linux vuln_72_8 4.18.0-CVE-2021-4204+ #1 SMP Fri Nov 22 17:21:12 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

kselftests before and after
kernel-selftests-before.log
kernel-selftests-after.log

kselftests run with lockdep, kmemleak, and stress
kernel-selftests-stress-lockdep-kmemleak.log

The backported commits target bpf, so bpf specific kselftests before and after
bpf-selftests-before.log
bpf-selftests-after.log

Generally, I did what stable 5.15 did to address these two CVEs
https://lore.kernel.org/lkml/[email protected]/

Note: CVE-2022-48929 didn't have a ticket, but is a fix to a change made to address CVE-2022-0500. The backports referenced above folded this change into one of the 5 changes they made. I wanted to call out the change in its own commit. Thats why this PR has 6 commits, and stable only had 5

Note^: There was some discussion in PR #12 about how we ended up with a slightly different conditional in btf.c than what rocky8_10 has. That will be the case here as well. It seems like the backport of 45ce4b4 was not done correctly there. The conditional we end up with in this PR matches what is in current upstream.

Note^^: RH changelog would lead you to believe that c25b2ae addresses CVE-2022-0500 and CVE-2022-23222

  • bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL (Viktor Malik) [RHEL-8473 RHEL-8476 RHEL-8925 RHEL-9037] {CVE-2022-0500 CVE-2022-23222}

But cvg.org references a different upstream commit as the fix for CVE-2022-23222
https://www.cve.org/CVERecord/?id=CVE-2022-23222
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64620e0a1e712a778095bd35cbb277dc2259281f

This ^^^ is what I have backported to address CVE-2022-23222 and matches what stable did for 5.15

jira VULN-72
cve-pre CVE-2021-4204
commit-author Kumar Kartikeya Dwivedi <[email protected]>
commit 3363bd0
upstream-diff Slight change due to previously backported commit
              38f85a6 (bpf: Fix crash due to out of bounds
              access into reg2btf_ids.)

Allow passing PTR_TO_CTX, if the kfunc expects a matching struct type,
and punt to PTR_TO_MEM block if reg->type does not fall in one of
PTR_TO_BTF_ID or PTR_TO_SOCK* types. This will be used by future commits
to get access to XDP and TC PTR_TO_CTX, and pass various data (flags,
l4proto, netns_id, etc.) encoded in opts struct passed as pointer to
kfunc.

For PTR_TO_MEM support, arguments are currently limited to pointer to
scalar, or pointer to struct composed of scalars. This is done so that
unsafe scenarios (like passing PTR_TO_MEM where PTR_TO_BTF_ID of
in-kernel valid structure is expected, which may have pointers) are
avoided. Since the argument checking happens basd on argument register
type, it is not easy to ascertain what the expected type is. In the
future, support for PTR_TO_MEM for kfunc can be extended to serve other
usecases. The struct type whose pointer is passed in may have maximum
nesting depth of 4, all recursively composed of scalars or struct with
scalars.

Future commits will add negative tests that check whether these
restrictions imposed for kfunc arguments are duly rejected by BPF
verifier or not.

	Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
	Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
(cherry picked from commit 3363bd0)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-72
cve CVE-2021-4204
commit-author Daniel Borkmann <[email protected]>
commit be80a1d

Generalize the check_ctx_reg() helper function into a more generic named one
so that it can be reused for other register types as well to check whether
their offset is non-zero. No functional change.

	Signed-off-by: Daniel Borkmann <[email protected]>
	Acked-by: John Fastabend <[email protected]>
	Acked-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit be80a1d)
	Signed-off-by: Brett Mastbergen <[email protected]>
@PlaidCat
Copy link
Collaborator

Note^: There was some discussion in PR #12 about how we ended up with a slightly different conditional in btf.c than what rocky8_10 has. That will be the case here as well. It seems like the backport of 45ce4b4 was not done correctly there. The conditional we end up with in this PR matches what is in current upstream.

I think this was because I was looking at how the 553.16.1 kernel looked versus how the pre-req commit reorganized the code: bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support cd19f9c

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is looking pretty good.

Do we know if the following test just now work OR are the on the flappy list?

Screenshot 2024-11-25 at 5 03 00 PM

Thanks

@gvrose8192
Copy link
Collaborator

Everything is looking pretty good.

Do we know if the following test just now work OR are the on the flappy list?

Screenshot 2024-11-25 at 5 03 00 PM

Thanks

Wow - interesting. If those results are consistent that is a big fix.

@gvrose8192
Copy link
Collaborator

The code looks fine to me - Thanks!
I'd be very interested in how consistent the results are from the above bpf testing comments.

Copy link
Collaborator

@gvrose8192 gvrose8192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks!

@bmastbergen
Copy link
Collaborator Author

Everything is looking pretty good.

Do we know if the following test just now work OR are the on the flappy list?

Screenshot 2024-11-25 at 5 03 00 PM

Thanks

I assumed that these are "flappy". But I guess I don't know for sure.

jira VULN-72
jira VULN-7854
cve 2021-4204
cve 2022-48929
commit-author Kumar Kartikeya Dwivedi <[email protected]>
commit 45ce4b4
upstream-diff Part of this upstream change was already backported, but
              because commit 3363bd0 ("bpf: Extend kfunc with
              PTR_TO_CTX, PTR_TO_MEM argument support") had not been
              backported at that time, the out of bound access it
              introduced was not fixed in that backport.  Since we
              have now backported 3363bd0, we need to backport
              the remaining change from the upstream fix

When commit e6ac245 ("bpf: Support bpf program calling kernel function") added
kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier
reg type to the appropriate btf_vmlinux BTF ID, however
commit c25b2ae ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after
the base register types, and defined other variants using type flag
composition. However, now, the direct usage of reg->type to index into
reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to
out of bounds access and kernel crash on dereference of bad pointer.

Fixes: c25b2ae ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
	Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
	Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
(cherry picked from commit 45ce4b4)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-72
cve CVE-2021-4204
commit-author Daniel Borkmann <[email protected]>
commit 6788ab2

Right now the assertion on check_ptr_off_reg() is only enforced for register
types PTR_TO_CTX (and open coded also for PTR_TO_BTF_ID), however, this is
insufficient since many other PTR_TO_* register types such as PTR_TO_FUNC do
not handle/expect register offsets when passed to helper functions.

Given this can slip-through easily when adding new types, make this an explicit
allow-list and reject all other current and future types by default if this is
encountered.

Also, extend check_ptr_off_reg() to handle PTR_TO_BTF_ID as well instead of
duplicating it. For PTR_TO_BTF_ID, reg->off is used for BTF to match expected
BTF ids if struct offset is used. This part still needs to be allowed, but the
dynamic off from the tnum must be rejected.

Fixes: 69c087b ("bpf: Add bpf_for_each_map_elem() helper")
Fixes: eaa6bcb ("bpf: Introduce bpf_per_cpu_ptr()")
	Signed-off-by: Daniel Borkmann <[email protected]>
	Acked-by: John Fastabend <[email protected]>
	Acked-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit 6788ab2)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-144
cve CVE-2022-23222
commit-author Daniel Borkmann <[email protected]>
commit 64620e0

Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument. They both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.

Meaning, after a NULL check in the code, the verifier will promote the register
type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known
zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the
latter could have an offset.

The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non-
zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit()
or bpf_ringbuf_discard() but with the original offset given it will then read
out the struct bpf_ringbuf_hdr mapping.

The verifier missed to enforce a zero offset, so that out of bounds access
can be triggered which could be used to escalate privileges if unprivileged
BPF was enabled (disabled by default in kernel).

Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it")
	Reported-by: <[email protected]> (SecCoder Security Lab)
	Signed-off-by: Daniel Borkmann <[email protected]>
	Acked-by: John Fastabend <[email protected]>
	Acked-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit 64620e0)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-72
cve CVE-2021-4204
commit-author Daniel Borkmann <[email protected]>
commit a672b2e

The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument, and thus both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.

While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other
helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now
only enforce a register type of PTR_TO_MEM.

This can lead to potential type confusion since it would allow other PTR_TO_MEM
memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve().

Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that:

 - bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC
 - bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC
   but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM
 - however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM
   to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their
   func proto description

Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it")
	Reported-by: Alexei Starovoitov <[email protected]>
	Signed-off-by: Daniel Borkmann <[email protected]>
	Acked-by: John Fastabend <[email protected]>
	Acked-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit a672b2e)
	Signed-off-by: Brett Mastbergen <[email protected]>
@bmastbergen bmastbergen force-pushed the bmastbergen_fips-legacy-8-compliant-vuln-72-144 branch from e06dc0c to 5ca0732 Compare November 26, 2024 14:45
@bmastbergen
Copy link
Collaborator Author

Force pushed just to add reference to newly created VULN-7854 in the commit log

@PlaidCat
Copy link
Collaborator

PlaidCat commented Nov 26, 2024

Everything is looking pretty good.
Do we know if the following test just now work OR are the on the flappy list?
Screenshot 2024-11-25 at 5 03 00 PM
Thanks

I assumed that these are "flappy". But I guess I don't know for sure.

Do you still have the VM thats upgraded?
Could you rerun the test 5x to get a higher sample size?

I have a basic test I've been using for the releases by leveraging the built RPM for kernel-selftests-internal ... this should be pretty easy to transcribe to your BPF test execution

$ cat run_kerselftests.sh
#!/bin/bash

KENREL_VERSION=$(uname -r)

for i in $(seq 1 $1);
do
	echo "Starting Test Loop ${i}"
	sudo /usr/libexec/kselftests/run_kselftest.sh > /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}.log
	echo "Test Loop ${i} Done"
	grep -v "^#" /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}.log > /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}_nocomments.log
done

@bmastbergen
Copy link
Collaborator Author

Everything is looking pretty good.
Do we know if the following test just now work OR are the on the flappy list?
Screenshot 2024-11-25 at 5 03 00 PM
Thanks

I assumed that these are "flappy". But I guess I don't know for sure.

Do you still have the VM thats upgraded? Could you rerun the test 5x to get a higher sample size?

I have a basic test I've been using for the releases by leveraging the built RPM for kernel-selftests-internal ... this should be pretty easy to transcribe to your BPF test execution

$ cat run_kerselftests.sh
#!/bin/bash

KENREL_VERSION=$(uname -r)

for i in $(seq 1 $1);
do
	echo "Starting Test Loop ${i}"
	sudo /usr/libexec/kselftests/run_kselftest.sh > /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}.log
	echo "Test Loop ${i} Done"
	grep -v "^#" /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}.log > /mnt/code/kernel_${KENREL_VERSION}_iteration_${i}_nocomments.log
done

I do have the vm so I can try. I actually ran the bpf tests out of the source tree, not with kernel-selftest-internal. Maybe I'll do a run of 5 both ways to see what shakes out

@bmastbergen bmastbergen merged commit 7964fac into fips-legacy-8-compliant/4.18.0-425.13.1 Nov 26, 2024
4 checks passed
@bmastbergen bmastbergen deleted the bmastbergen_fips-legacy-8-compliant-vuln-72-144 branch November 26, 2024 20:29
PlaidCat added a commit that referenced this pull request Dec 16, 2024
jira LE-2157
Rebuild_History Non-Buildable kernel-5.14.0-503.15.1.el9_5
commit-author Jamie Bainbridge <[email protected]>
commit a699781

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
	Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit a699781)
	Signed-off-by: Jonathan Maple <[email protected]>
PlaidCat added a commit that referenced this pull request Dec 17, 2024
jira LE-2169
Rebuild_History Non-Buildable kernel-4.18.0-553.27.1.el8_10
commit-author Jamie Bainbridge <[email protected]>
commit a699781

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
	Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit a699781)
	Signed-off-by: Jonathan Maple <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Stefan Assmann <[email protected]>
commit 4e264be

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  ctrliq#1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  ctrliq#2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  ctrliq#3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  ctrliq#4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  ctrliq#5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  ctrliq#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  ctrliq#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  ctrliq#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  ctrliq#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 ctrliq#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 ctrliq#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 ctrliq#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 ctrliq#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 ctrliq#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 ctrliq#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 ctrliq#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 ctrliq#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 ctrliq#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 ctrliq#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 ctrliq#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 ctrliq#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 ctrliq#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
	Reported-by: Marius Cornea <[email protected]>
	Signed-off-by: Stefan Assmann <[email protected]>
	Reviewed-by: Michal Kubiak <[email protected]>
	Tested-by: Rafal Romanowski <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 4e264be)
	Signed-off-by: Jonathan Maple <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Eelco Chaudron <[email protected]>
commit de9df6c

Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   ctrliq#1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   ctrliq#2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   ctrliq#3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   ctrliq#4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   ctrliq#5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   ctrliq#6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   ctrliq#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   ctrliq#1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   ctrliq#2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   ctrliq#3 [ffff80000a5d3120] die at ffffb70f0628234c
   ctrliq#4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   ctrliq#5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   ctrliq#6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   ctrliq#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   ctrliq#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   ctrliq#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  ctrliq#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  ctrliq#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  ctrliq#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  ctrliq#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  ctrliq#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  ctrliq#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  ctrliq#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  ctrliq#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d9 ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets")
	Signed-off-by: Eelco Chaudron <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
	Acked-by: Aaron Conole <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit de9df6c)
	Signed-off-by: Jonathan Maple <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants