selftests/bpf: Fix pyperf180 compilation failure with clang18 #20

bmastbergen · 2024-12-09T15:58:53Z

jira LE-2125

Running bpf selftests can result in the fatal error below with certain CPU and LLVM version combinations:

CLNG-BPF [test_maps] pyperf180.o
fatal error: error in backend: Branch target out of insn range
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang -g -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/g.v.rose/prj/kernel-build-tmp/tools/testing/selftests/bpf/tools/include -I/home/g.v.rose/prj/kernel-build-tmp/tools/testing/selftests/bpf -I/home/g.v.rose/prj/kernel-build-tmp/tools/include/uapi -I/home/g.v.rose/prj/kernel-build-tmp/tools/testing/selftests/usr/include -idirafter /usr/bin/../lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 -target bpf -c progs/pyperf180.c -mcpu=v3 -o /home/g.v.rose/prj/kernel-build-tmp/tools/testing/selftests/bpf/pyperf180.o
1.      <eof> parser at end of file
2.      Code generation
 #0 0x00007ff275cfdcba llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/lib64/libLLVM.so.18.1+0x8fdcba)
 #1 0x00007ff275cfb464 llvm::sys::RunSignalHandlers() (/lib64/libLLVM.so.18.1+0x8fb464)
 #2 0x00007ff275c2d2a2 (/lib64/libLLVM.so.18.1+0x82d2a2)
 #3 0x00007ff275c2d25f (/lib64/libLLVM.so.18.1+0x82d25f)
 #4 0x00007ff275cf792d (/lib64/libLLVM.so.18.1+0x8f792d)
 #5 0x0000000000213717 (/usr/bin/clang-18+0x213717)
 #6 0x00007ff275c3ebd7 llvm::report_fatal_error(llvm::Twine const&, bool) (/lib64/libLLVM.so.18.1+0x83ebd7)
 #7 0x00007ff275c3ea9a (/lib64/libLLVM.so.18.1+0x83ea9a)
 #8 0x00007ff278e95ca1 (/lib64/libLLVM.so.18.1+0x3a95ca1)
 #9 0x00007ff2776c1ab1 llvm::MCAssembler::layout(llvm::MCAsmLayout&) (/lib64/libLLVM.so.18.1+0x22c1ab1)
#10 0x00007ff2776c1d4b llvm::MCAssembler::Finish() (/lib64/libLLVM.so.18.1+0x22c1d4b)
#11 0x00007ff2776e4bda llvm::MCELFStreamer::finishImpl() (/lib64/libLLVM.so.18.1+0x22e4bda)
#12 0x00007ff276727dab llvm::AsmPrinter::doFinalization(llvm::Module&) (/lib64/libLLVM.so.18.1+0x1327dab)
#13 0x00007ff275e8a4c1 llvm::FPPassManager::doFinalization(llvm::Module&) (/lib64/libLLVM.so.18.1+0xa8a4c1)
#14 0x00007ff275e83e11 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/lib64/libLLVM.so.18.1+0xa83e11)
#15 0x00007ff27e19520b clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/lib64/libclang-cpp.so.18.1+0x279520b)
#16 0x00007ff27e593680 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/lib64/libclang-cpp.so.18.1+0x2b93680)
#17 0x00007ff27ced4cd6 clang::ParseAST(clang::Sema&, bool, bool) (/lib64/libclang-cpp.so.18.1+0x14d4cd6)
#18 0x00007ff27f15ade6 clang::FrontendAction::Execute() (/lib64/libclang-cpp.so.18.1+0x375ade6)
#19 0x00007ff27f0d2210 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/lib64/libclang-cpp.so.18.1+0x36d2210)
#20 0x00007ff27f1d938e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/lib64/libclang-cpp.so.18.1+0x37d938e)
#21 0x0000000000213486 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/bin/clang-18+0x213486)
#22 0x00000000002100d4 (/usr/bin/clang-18+0x2100d4)
#23 0x00007ff27ecff5dd (/lib64/libclang-cpp.so.18.1+0x32ff5dd)
#24 0x00007ff275c2d234 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/lib64/libLLVM.so.18.1+0x82d234)
#25 0x00007ff27ecff197 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/lib64/libclang-cpp.so.18.1+0x32ff197)
#26 0x00007ff27ecc4b97 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/lib64/libclang-cpp.so.18.1+0x32c4b97)
#27 0x00007ff27ecc4df7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/lib64/libclang-cpp.so.18.1+0x32c4df7)
#28 0x00007ff27ece2eae clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/lib64/libclang-cpp.so.18.1+0x32e2eae)
#29 0x000000000020f930 clang_main(int, char**, llvm::ToolContext const&) (/usr/bin/clang-18+0x20f930)
#30 0x000000000021d76a main (/usr/bin/clang-18+0x21d76a)
#31 0x00007ff274c295d0 __libc_start_call_main (/lib64/libc.so.6+0x295d0)
#32 0x00007ff274c29680 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x29680)
#33 0x000000000020c835 _start (/usr/bin/clang-18+0x20c835)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 18.1.8 (RESF 18.1.8-3.el9)
Target: bpf
Thread model: posix
InstalledDir: /bin
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/pyperf180-99665f.c
clang: note: diagnostic msg: /tmp/pyperf180-99665f.sh
clang: note: diagnostic msg:

********************
make: *** [Makefile:526: /home/g.v.rose/prj/kernel-build-tmp/tools/testing/selftests/bpf/pyperf180.o] Error 1

This PR backports the following upstream fix for this situation:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/testing/selftests/bpf/progs/pyperf180.c?h=v6.13-rc1&id=100888fb6d8a185866b1520031ee7e3182b173de

Below are logs from running bpf selftests before and after the fix:
bpf_selftests_before.log
bpf_selftests_after.log

jira LE-2125 commit-author Yonghong Song <[email protected]> commit 100888f With latest clang18 (main branch of llvm-project repo), when building bpf selftests, [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j The following compilation error happens: fatal error: error in backend: Branch target out of insn range ... Stack dump: 0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o 1. <eof> parser at end of file 2. Code generation ... The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay since cpu=v4 supports 32-bit branch target offset. The above failure is due to upstream llvm patch [1] where some inlining behavior are changed in clang18. To workaround the issue, previously all 180 loop iterations are fully unrolled. The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the compiler is clang18, the unrollng amount is unconditionally reduced. [1] llvm/llvm-project@1a2e77c Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] (cherry picked from commit 100888f) Signed-off-by: Brett Mastbergen <[email protected]>

PlaidCat · 2024-12-09T18:15:29Z

Does this address the issues @gvrose8192 had with: #19 (review)
where we don't have to set a separate directory?

gvrose8192 · 2024-12-09T18:18:45Z

Does this address the issues @gvrose8192 had with: #19 (review) where we don't have to set a separate directory?

I think that's a separate issue.

bmastbergen · 2024-12-09T18:22:41Z

Does this address the issues @gvrose8192 had with: #19 (review) where we don't have to set a separate directory?

I don't believe this change has any impact on that.

FWIW I have been running 'make -C tools/testing/selftests/bpf' from the top level kernel repo directory. That works before (up to the pyperf180 testcase) and after the change.

gvrose8192

LGTM

PlaidCat

…s_lock For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: [email protected] Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Nilay Shroff <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>

jira LE-1907 Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2 commit-author Stefan Assmann <[email protected]> commit 4e264be When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb ctrliq#1 [ffffaad04005fae8] schedule at ffffffff8b323e2d ctrliq#2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc ctrliq#3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 ctrliq#4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] ctrliq#5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 ctrliq#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa ctrliq#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc ctrliq#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e ctrliq#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 ctrliq#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 ctrliq#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] ctrliq#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] ctrliq#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] ctrliq#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 ctrliq#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 ctrliq#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 ctrliq#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 ctrliq#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 ctrliq#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc ctrliq#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d ctrliq#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 ctrliq#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove") Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <[email protected]> Signed-off-by: Stefan Assmann <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 4e264be) Signed-off-by: Jonathan Maple <[email protected]>

bmastbergen requested review from gvrose8192 and PlaidCat December 9, 2024 15:59

gvrose8192 approved these changes Dec 9, 2024

View reviewed changes

PlaidCat approved these changes Dec 13, 2024

View reviewed changes

bmastbergen merged commit c566432 into ciqlts9_2 Dec 13, 2024
7 checks passed

bmastbergen deleted the bmastbergen_ciqlts9_2_pyperf180 branch December 13, 2024 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selftests/bpf: Fix pyperf180 compilation failure with clang18 #20

selftests/bpf: Fix pyperf180 compilation failure with clang18 #20

bmastbergen commented Dec 9, 2024

PlaidCat commented Dec 9, 2024

gvrose8192 commented Dec 9, 2024

bmastbergen commented Dec 9, 2024

gvrose8192 left a comment

PlaidCat left a comment

selftests/bpf: Fix pyperf180 compilation failure with clang18 #20

selftests/bpf: Fix pyperf180 compilation failure with clang18 #20

Conversation

bmastbergen commented Dec 9, 2024

PlaidCat commented Dec 9, 2024

gvrose8192 commented Dec 9, 2024

bmastbergen commented Dec 9, 2024

gvrose8192 left a comment

Choose a reason for hiding this comment

PlaidCat left a comment

Choose a reason for hiding this comment