Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Failed to allocate trace_probe' on Arch Linux kernel 4.15.9-1 #1634

Closed
enchantner opened this issue Mar 18, 2018 · 12 comments
Closed

'Failed to allocate trace_probe' on Arch Linux kernel 4.15.9-1 #1634

enchantner opened this issue Mar 18, 2018 · 12 comments

Comments

@enchantner
Copy link

enchantner commented Mar 18, 2018

I have Arch Linux and one of the latest stable kernels (just default linux package):

Linux version 4.15.9-1-ARCH (builduser@heftig-13420) (gcc version 7.3.0 (GCC)) #1 SMP PREEMPT Sun Mar 11 17:54:33 UTC 2018

Bcc is installed from AUR using yaourt without any issues. When trying to execute tests or run trace, for example, I'm getting this message:

check dmesg output for possible cause
Failed to attach BPF to kprobe

When checking dmesg, I see this:

[17251.859127] trace_kprobe: Failed to allocate trace_probe.(-22)

Or, when running tests, something like this:

[ 4735.987595] trace_kprobe: Could not insert probe at vfs_caches_init_early+0: -22
[ 4736.106403] trace_kprobe: Could not insert probe at vfs_caches_init+0: -22
[ 4870.225214] trace_kprobe: Could not insert probe at md_flush_request+0: -2

Kernel has all the required params:

CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_DUMMY=m
CONFIG_VXLAN=m

I also found there are some parameters that are used by tool like SystemTap (https://wiki.archlinux.org/index.php/SystemTap) and those are not there by default:

CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_DEBUG_INFO_VTA=y

Should I recompile my kernel with these now? Or what should I do?

@enchantner
Copy link
Author

This may theoretically be connected to anti-Spectre patches in new kernel https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.15-rc8-BPF-Security

@yonghong-song
Copy link
Collaborator

Both vfs_caches_init_early and vfs_caches_init has annotations __init in kernel definitions. So they will be called during kernel initial stage and then will be called again. So I suspect the reason is due to these function addresses are not in valid address ranges (e.g., init sections are freed) and hence kprobe attaching will fail.

For example, in your /boot/vmlinux-<kernel_version> file, you may have:

  [17] .init.text        PROGBITS         ffffffff81f5a000  0135a000
       000000000005b3d2  0000000000000000  AX       0     0     16
  [18] .altinstr_aux     PROGBITS         ffffffff81fb53d2  013b53d2
       000000000000069c  0000000000000000  AX       0     0     1

And in your kallsyms, you may have

ffffffff81f867ee T vfs_caches_init_early
ffffffff81f86863 T vfs_caches_init

Both the above two functions are in .init.text which in general will be freed.

The tool can be improved to exclude these symbols whose address range is in .init.text.
A few APIs, attach_kprobe, attach_kretprobe and get_kprobe_functions will need this fix.

yonghong-song added a commit that referenced this issue Mar 23, 2018
Fix issue #1634.

When kernel is about to attach a kprobe, the following functions
are called:
  register_kprobe
    check_kprobe_address_safe
      kernel_text_address
        core_kernel_text

In core_kernel_text, we have:
        if (addr >= (unsigned long)_stext &&
            addr < (unsigned long)_etext)
                return 1;

Basically, any address outside of [_stext, _etext] will
be rejected.

The functions marked as __init are outside [_stext, _etext].
That is why vfs_caches_init_early and vfs_caches_init
are rejected by trace_kprobe.
Given a regex, this patch avoided attaching these functions
if their func addresses are outside [_stext, _etext] range.

Signed-off-by: Yonghong Song <[email protected]>
yonghong-song added a commit that referenced this issue Mar 23, 2018
Fix issue #1634.

When kernel is about to attach a kprobe, the following functions
are called:
  register_kprobe
    check_kprobe_address_safe
      kernel_text_address
        core_kernel_text

In core_kernel_text, we have:
        if (addr >= (unsigned long)_stext &&
            addr < (unsigned long)_etext)
                return 1;

Basically, any address outside of [_stext, _etext] will
be rejected.

The functions marked as __init are outside [_stext, _etext].
That is why vfs_caches_init_early and vfs_caches_init
are rejected by trace_kprobe.
Given a regex, this patch avoided attaching these functions
if their func addresses are outside [_stext, _etext] range.

Signed-off-by: Yonghong Song <[email protected]>
@enchantner
Copy link
Author

enchantner commented Mar 25, 2018

Tried to run tests with your last patch, @yonghong-song . Most of them still failed, even though dmesg looks better now.

Failed tests:

The following tests FAILED:
2 - c_test_static (Failed)
3 - test_libbcc (Failed)
4 - py_test_stat1_b (Failed)
5 - py_test_bpf_log (Failed)
6 - py_test_stat1_c (Failed)
7 - py_test_xlate1_c (Failed)
8 - py_test_call1 (Failed)
15 - py_test_brb (Failed)
16 - py_test_brb2 (Failed)
18 - py_test_histogram (Failed)
27 - py_test_tools_smoke (Failed)
28 - py_test_tools_memleak (Failed)

Here's an updated dmesg:

[212368.048456] eth0: renamed from py_stat1_b.in
[212368.137070] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[212368.219888] IPv6: ADDRCONF(NETDEV_UP): py_stat1_b.out: link is not ready
[212368.219897] IPv6: ADDRCONF(NETDEV_CHANGE): py_stat1_b.out: link becomes ready
[212368.219970] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[212368.712729] eth0: renamed from py_stat1_c.in
[212368.797213] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[212368.867234] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[212369.379029] eth0: renamed from py_xlate1_c.in
[212369.456782] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[212369.536178] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[212369.743621] eth0: renamed from py_call1_c.in
[212369.832936] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[212369.913932] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[212518.618793] trace_kprobe: Could not insert probe at md_flush_request+0: -2

It's also a little bit weird it's arguing about "eth0" even though I have systemd and my ifaces are named like "enp7s0". Some hardcode?

@yonghong-song
Copy link
Collaborator

The eth0 is used in the namespace, so it is fine. Let us look at an example,
in bcc/tests/python/CMakefile.txt, we have

add_test(NAME py_test_stat1_c WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
  COMMAND ${TEST_WRAPPER} py_stat1_c namespace ${CMAKE_CURRENT_SOURCE_DIR}/test_stat1.py test_stat1.c)

The test run command is

${TEST_WRAPPER} py_stat1_c namespace ${CMAKE_CURRENT_SOURCE_DIR}/test_stat1.py test_stat1.c

The TEST_WRAPPER is at build/tests/wrapper.sh. If it requires a namespace, it will create a networking namespace like below and run

function ns_run() {
  sudo ip netns add $ns
  sudo ip link add $ns.in type veth peer name $ns.out
  sudo ip link set $ns.in netns $ns
  sudo ip netns exec $ns ip link set $ns.in name eth0
  sudo ip netns exec $ns ip addr add dev eth0 172.16.1.2/24
  sudo ip netns exec $ns ip link set eth0 up
  sudo ip netns exec $ns ethtool -K eth0 tx off
  sudo ip addr add dev $ns.out 172.16.1.1/24
  sudo ip link set $ns.out up
  sudo bash -c "PYTHONPATH=$PYTHONPATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH ip netns exec $ns $cmd $1 $2"
  return $?
}

In your case, based on my past experience, it is due to two things:

  • some python package is not installed (e.g., python-netaddr or maybe others)
  • the kernel config is missing some entries, esp. related to linux traffic control subsystem. In kernel config file, it is under Queueing/Scheduling.

Unfortunately, you need to run individual test to debug this. Could you try this and may submit a patch for documentation with your finding, so it becomes clear what additional configuration steps are required for the test to run? Thanks!

@enchantner
Copy link
Author

enchantner commented Mar 30, 2018

Okay, so here's the verbose output for test_libbcc:

Parse error:
4@i%ra+1r
-------^

test_libbcc is a Catch v1.4.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
test listing all USDT probes in Ruby/MRI
  without a running Ruby process
-------------------------------------------------------------------------------
/home/enchantner/Trunk/bcc/tests/cc/test_usdt_probes.cc:118
...............................................................................

/home/enchantner/Trunk/bcc/tests/cc/test_usdt_probes.cc:127: FAILED:
  REQUIRE( ctx.num_probes() > 10 )
with expansion:
  0 > 10

-------------------------------------------------------------------------------
test listing all USDT probes in Ruby/MRI
  with a running Ruby process
  get probe in running process
-------------------------------------------------------------------------------
/home/enchantner/Trunk/bcc/tests/cc/test_usdt_probes.cc:118
...............................................................................

/home/enchantner/Trunk/bcc/tests/cc/test_usdt_probes.cc:206: FAILED:
  REQUIRE( probe )
with expansion:
  false

===============================================================================
test cases:  24 |  23 passed | 1 failed
assertions: 521 | 519 passed | 2 failed

As far as I understand, it fails to attach probes somewhere deep inside the code. Also, failing regexp at the beginning looks weird. What should I do?

@yonghong-song
Copy link
Collaborator

The parsing error is expected. So it is not an error.

The assertion at line 127 indicates that the ruby in your system does not have USDTs.
I guess you can ignore this failure or help improve it by using readelf -n to check ruby library whether any USDT available.

[yhs@localhost bcc]$ ldd /usr/bin/ruby-mri
	linux-vdso.so.1 (0x00007ffd0c1ec000)
	libruby.so.2.4 => /lib64/libruby.so.2.4 (0x00007f7b6168b000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7b6146c000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f7b61268000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f7b61032000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f7b60cdd000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f7b608fa000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7b61d2b000)
	libfreebl3.so => /lib64/libfreebl3.so (0x00007f7b606f7000)
[yhs@localhost bcc]$ readelf -n /lib64/libruby.so.2.4

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 1bc6a9549e2bb3b73ec63790556b16d2154f0483

Displaying notes found in: .note.stapsdt
  Owner                 Data size	Description
  stapsdt              0x00000046	NT_STAPSDT (SystemTap probe descriptors)
    Provider: ruby
    Name: array__create
    Location: 0x000000000002e9a8, Base: 0x000000000023f285, Semaphore: 0x000000000048d1dc
    Arguments: -8@%rbx 8@%rax -4@12(%rsp)
...

@enchantner
Copy link
Author

enchantner commented Mar 31, 2018

Thanks, I'll research this ruby thing. But I think we're getting further and further from the original question "why attaching kprobe fails". Let's go a little back, so, it doesn't really matter that tests fail. Here's the strace dump from trying to run "trace" program. Any ideas?

bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=28, insns=0x7f1625f307d0, license="GPL", log_level=0, log_size=0, log_buf=0, kern_version=265997, prog_flags=0, ...}, 64) = 4
openat(AT_FDCWD, "/sys/kernel/debug/tracing/kprobe_events", O_WRONLY|O_APPEND) = 5
getpid()                                = 20175
write(5, "p:kprobes/p__/test_bcc_20175 ./test", 35) = -1 EINVAL (Invalid argument)
write(2, "check dmesg output for possible cause\n", 38check dmesg output for possible cause
) = 38
close(5)                                = 0
munmap(NULL, 36864)                     = 0
write(1, "Failed to attach BPF to kprobe\n", 31) = 31
munmap(0x7f1625f30000, 4096)            = 0
munmap(0x7f1625f2f000, 4096)            = 0
close(3)                                = 0

@yonghong-song
Copy link
Collaborator

For your original failure, vfs function related should have been fixed. The md_flush_request may be a kernel config issue and your kernel config may not have CONFIG_BLK_DEV_MD.

What is your trace command line for the above kprobe failure?

@enchantner
Copy link
Author

I have CONFIG_BLK_DEV_MD=m in current kernel config. Basically, I've compiled the simplest C++ program:

#include <stdio.h>

int func1(int a, int b, int c) {
    printf("%d %d %d\n", a, b, c);
}

int main(void) {
    func1(1, 2, 3);
}

... and then I'm calling the trace utility from bcc over this test executable:

sudo strace -s 128 trace ./test 2> trace_strace.log

@yonghong-song
Copy link
Collaborator

Your command line is incorrect. It should be trace './test:<func_name>' for uprobe, e.g., trace './test:func1'. You can look at bcc/tools/trace_example.txt for how to use it.

@enchantner
Copy link
Author

Nice! So much for informative error messages 🥇 Thanks a lot for clarification!

@yonghong-song
Copy link
Collaborator

The original cannot attaching kprobe issue has been fixed. Let us close this issue.
Different issues can be filed for other problems.

banh-gao pushed a commit to banh-gao/bcc that referenced this issue Jun 21, 2018
Fix issue iovisor#1634.

When kernel is about to attach a kprobe, the following functions
are called:
  register_kprobe
    check_kprobe_address_safe
      kernel_text_address
        core_kernel_text

In core_kernel_text, we have:
        if (addr >= (unsigned long)_stext &&
            addr < (unsigned long)_etext)
                return 1;

Basically, any address outside of [_stext, _etext] will
be rejected.

The functions marked as __init are outside [_stext, _etext].
That is why vfs_caches_init_early and vfs_caches_init
are rejected by trace_kprobe.
Given a regex, this patch avoided attaching these functions
if their func addresses are outside [_stext, _etext] range.

Signed-off-by: Yonghong Song <[email protected]>
CrackerCat pushed a commit to CrackerCat/bcc that referenced this issue Jul 31, 2024
Fix issue iovisor#1634.

When kernel is about to attach a kprobe, the following functions
are called:
  register_kprobe
    check_kprobe_address_safe
      kernel_text_address
        core_kernel_text

In core_kernel_text, we have:
        if (addr >= (unsigned long)_stext &&
            addr < (unsigned long)_etext)
                return 1;

Basically, any address outside of [_stext, _etext] will
be rejected.

The functions marked as __init are outside [_stext, _etext].
That is why vfs_caches_init_early and vfs_caches_init
are rejected by trace_kprobe.
Given a regex, this patch avoided attaching these functions
if their func addresses are outside [_stext, _etext] range.

Signed-off-by: Yonghong Song <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants