CO-RE requires to have BTF information describing the types of the kernel in order to perform the relocations. This is usually provided by the kernel itself when it's configured with CONFIG_DEBUG_INFO_BTF. However, this configuration is not enabled in all the distributions and it is not available on older kernels.
It's possible to use CO-RE in kernels without CONFIG_DEBUG_INFO_BTF support by providing the BTF information from an external source. BTFHUB contains BTF files to each released kernel not supporting BTF, for the most popular distributions.
Providing this BTF file for a given kernel has some challenges:
-
Each BTF file is a few MBs big, then it's not possible to ship the eBPF program with the all the BTF files needed to run in different kernels. (The BTF files will be in the order of GBs if you want to support a high number of kernels)
-
Downloading the BTF file for the current kernel at runtime delays the start of the program and it isn't always possible to reach an external host to download such a file.
Providing the BTF file with the information about all the data types of the kernel for running an eBPF program is an overkill in many of the cases. Usually the eBPF programs access only some kernel fields.
Our proposal is: to extend libbpf to provide an API to generate a BTF file with only the types that are needed by an eBPF object. These generated files are very small compared to the ones that contain all the kernel types. This allows to ship an eBPF program together with the BTF information that it needs to run for many different kernels.
This idea was discussed during the Towards truly portable eBPF presentation a Linux Plumbers 2021. We prepared a BTFGen repository with an example of how this API can be used. Our plan is to include this support in bpftool once it's merged in libbpf.
There is also a good example on how to use BTFGen and BTFHub together to generate multiple BTF files, to each existing/supported kernel, tailored to one application. For example: a complex bpf object might support nearly 400 kernels by having BTF files summing only 1.5 MB.
When you have sometime, go ahead and watch Linux Plumbers 2021 presentation, responsible to explain the difficulties in having portable eBPF code: Towards truly portable eBPF.
At the end of the presentation, after we demonstrate all blockers in making an eBPF application to support multiple kernels, there is a a demonstration on how BTFHUB can be used AND a discussion about the future steps.
The btfgen tool was only created thanks to the effort of:
Mauricio Vasquez Bernal (Kinvolk/Microsoft) - main author
Rafael David Tinoco (Aqua Security) - fixes and review
Lorenzo Fontana (Elastic) - fixes and review
Itay Shakury (Aqua Security) - ideas, support and management
Marga Manterola (Kinvolk/Microsoft) - support and management
This document was created and reviewed by:
Rafael David Tinoco (Aqua Security) - main author
Mauricio Vasquez Bernal (Kinvolk/Microsoft) - review and fixes
Lorenzo Fontana (Elastic) - review and fixes
Yaniv Agman (Aqua Security) - review
The code has not been upstreamed yet and it is being developed at:
https://github.com/kinvolk/libbpf/tree/btfgen
https://github.com/kinvolk/btfgenThe intent was to go upstream with libbpf changes and create a btfgen as a sub-function in the bpftool tool.
As you might have read in the pointed documents, or already knew, eBPF portability highly depends on code relocation. Architectures have memory relocations made either during link or load time, so does BPF arch.
If you would like to revisit concepts about "linkers & loaders and relocations", go ahead and visit this post about it
A nice quote from Brendan Gregg to explain why relocations are needed for eBPF:
It's not just a matter of saving the BPF bytecode in ELF and then sending it to any other kernel. Many BPF programs walk kernel structs that can change from one kernel version to another. Your BPF bytecode may still execute on different kernels, but it may be reading the wrong struct offsets and printing garbage output!
This is an issue of relocation, and both BTF and CO-RE solve this for BPF binaries. BTF provides type information so that struct offsets and other details can be queried as needed, and CO-RE records which parts of a BPF program need to be rewritten, and how.
The eBPF ELF object is not organized the same way a regular ELF objects (and it is not an executable, so it does not contain program header table entries, like explained in the previous session. In summary: eBPF ELF files have different sections than the ones created by GCC/CLANG when dealing with other architectures.
In a same eBPF object file you might have multiple different eBPF programs. Each non-inlined function will be a different program. Each eBPF program will have its own ELF section (TEXT), differently than an ELF executable which has a single .text section, as well as a relocation table section (REL) only for it. All the maps declared in your eBPF object will be placed in a .maps (DATA) section, and so on.
Look at the following example:
The image above is an example of how the eBPF object of the BTFHUB example looks like. Taking a look at the source code you will see we have 2 inlined functions and 1 non-inlined one. As the reader probably knows, the inline function will become part of its callee (the compiler won't arrange the stack with a new frame), so at the end we will have only 1 eBPF program:
- a kprobe eBPF program triggered by the
do_sys_openat2
kernel function.
Note that this was done for education purposes only and a single call to
open
will likely trigger the eBPF program execution. Intent here was to show different program types and what their ELF sections would look like.
For each eBPF program ELF section, just one in this simple example, we have a correspondent section for all its local relocation info:
The information about the types, functions (and needed dynamic relocations) used in this eBPF object is contained in two ELF sections: .BTF and .BTF.ext, both explained in details later on.
After this introduction, this document will concentrate most, if not all, of its efforts in those structures and explain what the BTF GENERATOR tool is, and how eBPF programmers, seeking for code portability by using eBPF CO-RE, can benefit from it.
Perhaps the best document out there describing BTF is Andrii's - BTF deduplication and Linux Kernel BTF. In here it is good to mention that, without BTF, eBPF CO-RE would be very hard (or impossible) to be achieved.
In there you will find the following diagram:
illustrating the BTF type graph. As you can see, BTF consists in type descriptors to describe, using BTF types, all types being used in your eBPF object. Each BTF type has a certain KIND and might point to another BTF type or not.
You can imagine BTF as a memory chunk with:
- a header
- a chunk of null terminated strings
- concatenated BTF types
By the time this document was written, the following BTF kinds exist:
- BTF_KIND_INT: integer.
- BTF_KIND_FLOAT: float.
- BTF_KIND_PTR: points to another type.
- BTF_KIND_ARRAY: array of a certain type, using same/other type as index.
- BTF_KIND_STRUCT: has members of a certain type.
- BTF_KIND_UNION: same as struct.
- BTF_KIND_ENUM: enumerator of a certain type.
- BTF_KIND_FWD: forward-declaration to another type.
- BTF_KIND_TYPEDEF: typedef to another type.
- BTF_KIND_VOLATILE: volatile.
- BTF_KIND_CONST: constant.
- BTF_KIND_RESTRICT: restrict.
- BTF_KIND_FUNC: function. not a type. defines a subprogram/function.
- BTF_KIND_FUNC_PROTO: function prototype/signature type.
- BTF_KIND_VAR: variable. (points to variable type).
- BTF_KIND_DATASEC: elf data section.
BTF kinds are encoded as binary. They're placed one after another and, depending on its kind, they might have, or not, an addended structure containing more information (like STRUCTS that will contain addends for each existing STRUCT member).
Example of memory organization for the most important BTF type kinds:
Each existing variable, or function, in your eBPF object will have its type described in the ELF section .BTF, by using one of the BTF kinds described here.
The .BTF
and .BTF.ext
ELF sections can be encoded by 2 different ways:
-
The pahole tool (dwarves package): using existing non-stripped ELF file [DWARF debug data]((https://en.wikipedia.org/wiki/DWARF). The ELF file can be the kernel (check next topic) or a regular ELF file. Two extra ELF sections will be added to the same ELF used as input OR, more recently, to an external RAW BTF file (to feed libbpf).
-
LLVM: The 2 ELF sections are created automatically by LLVM, since it added support for BTF generation, when compiling eBPF programs, to provide BTF for the types the eBPF programs use.
As said previously, difference among these 2 ELF sections is:
- .BTF - contains information about all BTF types used within this object
- .BTF.ext - contains debug information about function prototypes, line numbers and, when encoded by LLVM, information about needed relocations to load the ELF object.
And you're able to visualize that information by executing bpftool (using the object generated by the BTFHUB example):
$ bpftool btf dump file ./example.bpf.o
[1] PTR '(anon)' type_id=2
[2] STRUCT 'pt_regs' size=168 vlen=21
'r15' type_id=3 bits_offset=0
'r14' type_id=3 bits_offset=64
'r13' type_id=3 bits_offset=128
'r12' type_id=3 bits_offset=192
'bp' type_id=3 bits_offset=256
'bx' type_id=3 bits_offset=320
'r11' type_id=3 bits_offset=384
'r10' type_id=3 bits_offset=448
'r9' type_id=3 bits_offset=512
'r8' type_id=3 bits_offset=576
'ax' type_id=3 bits_offset=640
'cx' type_id=3 bits_offset=704
'dx' type_id=3 bits_offset=768
'si' type_id=3 bits_offset=832
'di' type_id=3 bits_offset=896
'orig_ax' type_id=3 bits_offset=960
'ip' type_id=3 bits_offset=1024
'cs' type_id=3 bits_offset=1088
'flags' type_id=3 bits_offset=1152
'sp' type_id=3 bits_offset=1216
'ss' type_id=3 bits_offset=1280
[3] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[4] FUNC_PROTO '(anon)' ret_type_id=5 vlen=1
'ctx' type_id=1
[5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[6] FUNC 'do_sys_openat2' type_id=4 linkage=global
[7] STRUCT 'task_struct' size=9472 vlen=229
'thread_info' type_id=8 bits_offset=0
'state' type_id=12 bits_offset=192
'stack' type_id=14 bits_offset=256
'usage' type_id=15 bits_offset=320
'flags' type_id=11 bits_offset=352
'ptrace' type_id=11 bits_offset=384
'on_cpu' type_id=5 bits_offset=416
'wake_entry' type_id=19 bits_offset=448
'cpu' type_id=11 bits_offset=576
...
[8] STRUCT 'thread_info' size=24 vlen=3
'flags' type_id=3 bits_offset=0
'syscall_work' type_id=3 bits_offset=64
'status' type_id=9 bits_offset=128
[9] TYPEDEF 'u32' type_id=10
[10] TYPEDEF '__u32' type_id=11
[11] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
[12] VOLATILE '(anon)' type_id=13
[13] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
[14] PTR '(anon)' type_id=0
[15] TYPEDEF 'refcount_t' type_id=16
[16] STRUCT 'refcount_struct' size=4 vlen=1
'refs' type_id=17 bits_offset=0
[17] TYPEDEF 'atomic_t' type_id=18
[18] STRUCT '(anon)' size=4 vlen=1
'counter' type_id=5 bits_offset=0
[19] STRUCT '__call_single_node' size=16 vlen=4
'llist' type_id=20 bits_offset=0
'(anon)' type_id=22 bits_offset=64
'src' type_id=23 bits_offset=96
'dst' type_id=23 bits_offset=112
[20] STRUCT 'llist_node' size=8 vlen=1
'next' type_id=21 bits_offset=0
...
The bpftool command displays the .BTF
ELF section information about all types used by a given eBPF object. It gets all the BTF types and extract their data and the needed strings from the string chunk (since some BTF types have a "name_offset" pointer as an offset to the string buffer).
If you remember the graph showed earlier in this document, it can be obtained by following a specific BTF type declaration until its final resolution. One example, by focusing on a specific initial type (a struct):
The BTF_TYPE id 311 is a BTF_KIND_STRUCT and describes a STRUCT
called "swregs_state":
[311] STRUCT 'swregs_state' size=136 vlen=16
'cwd' type_id=9 bits_offset=0
'swd' type_id=9 bits_offset=32
'twd' type_id=9 bits_offset=64
'fip' type_id=9 bits_offset=96
'fcs' type_id=9 bits_offset=128
'foo' type_id=9 bits_offset=160
'fos' type_id=9 bits_offset=192
'st_space' type_id=302 bits_offset=224
'ftop' type_id=58 bits_offset=864
'changed' type_id=58 bits_offset=872
'lookahead' type_id=58 bits_offset=880
'no_update' type_id=58 bits_offset=888
'rm' type_id=58 bits_offset=896
'alimit' type_id=58 bits_offset=904
'info' type_id=312 bits_offset=960
'entry_eip' type_id=9 bits_offset=1024
The struct 'swregs_state' has a field 'entry_eip'. This field
is a BTF_MEMBER of the BTF_TYPE id 311 (struct). The member
points to BTF_TYPE id 9.
[9] TYPEDEF 'u32' type_id=10
The BTF_TYPE id 9 is a BTF_KIND_TYPEDEF and describes a typedef
called 'u32'. It points to the BTF_TYPE id 10.
[10] TYPEDEF '__u32' type_id=11
The BTF_TYPE id 10 is a BTF_KIND_TYPEDEF and describes a typedef
called '__u32'. It points to the BTF_TYPE id 11.
[11] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
The BTF_TYPE id 11 is a BTF_KIND_INT and describes an integer of
type "unsigned int". It is not a modifier BTF type, so it does not
point to anything more.
The BTF_TYPE id 11 is a BTF_KIND_INT and describes a “32 bits signed integer”
called "unsigned int". It is not a modifier BTF type, so it does not point to
anything more.
As previously said, the pahole tool (dwarves package) is able to encode BTF information for an ELF. This is how the Linux kernel gets BTF ELF sections nowadays, since the GCC compiler isn't able to generate those.
If you are wondering why the Linux kernel ELF file also has BTF information, then we reached into an important mark in this document. The eBPF relocations, done by libbpf when loading an eBPF object into the kernel, will use both BTFs - from the kernel and from the eBPF object - to calculate/speculate the relocations needed for the eBPF programs contained in the object to run into the kernel eBPF VM.
After this change, BTF information is also used to create eBPF maps. The BPF object contains enough information about the maps (ELF section
maps
) and its types (ELF section.BTF
) so it can create the eBPF map to be used by both: eBPF programs and userland code.
Before this commit, pahole only supported encoding this information into the ELF sections described. This commit added support for encoding BTF information into a detached file (which we will refer from now on as external BTF file, or raw BTF file).
The reasoning behind having external, to ELF, files was that libbpf needed to be able to load external BTF files to describe a current kernel that did not have BTF information available. Unfortunately some older distros might not be able to have kernels (even recent ones) with BTF information, so being able to generate BTF information from existing DWARF symbols - a thing that pahole does - is extremely necessary in those cases.
After the previous concept was stablished, of external BTF files, we stop here to understand what BTFHUB is and why it was created: external BTF files should be used for those kernels not supporting embedded BTF info (within its ELF sections) that keep debug information available somewhere (to be converted from DWARF format to BTF format).
From that need BTFHUB was created. The hub contains BTF files to each existing kernels (of the supported distributions). Its README.md file describes process of creating external BTF files, as well as enumerates some of the most used Linux distributions and if their supported (or some EOL) kernels support BTF or not.
Currently there is a big need to use external BTF files for: CentOS7 (7.5.1804, kernel: 3.10.0-862), CentoOS 8 (8.1.1911, kernel: 4.18.0-147), Fedora 29 (kernel: 4.18), Fedora 30 (kernel: 5.0), Fedora 31 (kernel: 5.3), Ubuntu Bionic (with HWE kernels: 5.4 and 5.8) and Ubuntu Focal (kernel: 5.4 and HWE kernels: 5.8 and 5.11). If you're using newer versions of those distributions, there is a high change you don't need external BTF files as your kernel might already have its ELF .BTF section information. You can check that by executing:
bpftool btf dump file /sys/kernel/btf/vmlinux format raw
and seeing if it produces desired results.
Note: BTFHUB is opened to contributions so, if you project requires external BTF files, you can always submit suggestions to BTFHUB for it to include your BTF files.
If you would like to give a try on how to use external BTF files, you may do the following:
$ git clone --recurse-submodules [email protected]:aquasecurity/btfhub.git
$ cd ./btfhub/example
$ make
Then you can execute example-c-static
binary either using the existing kernel BTF information (from /sys/kernel/btf/vmlinux), or by giving an external BTF file, so libbpf (used by example
code) can calculate needed relocations for running the eBPF object/programs in the current kernel.
First let's show how to give an external BTF file to the binary. I'm currently using the file that is provided by the running kernel, which is what libbpf does by default, just to test it:
$ sudo EXAMPLE_BTF_FILE=/sys/kernel/btf/vmlinux ./example-c-static
Foreground mode...<Ctrl-C> or or SIG_TERM to end it.
libbpf: loading example.bpf.o
...
And now, in another older kernel, from Ubuntu Bionic, I can run the exact same binary if I provide a BTF file from the BTFHUB repository:
$ uname -a
Linux bionic 5.4.0-87-generic #98~18.04.1-Ubuntu SMP Wed Sep 22 10:45:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# Copy compressed BTF file and uncompress it:
$ cp /sources/ebpf/aquasec-btfhub/ubuntu/bionic/x86_64/$(uname -r).btf.tar.xz .
$ tar xvfJ ./$(uname -r).btf.tar.xz
5.4.0-87-generic.btf
# Execute ./example-c-static with the BTF for the current kernel
$ sudo EXAMPLE_BTF_FILE=./5.4.0-87-generic.btf ./example-c-static
Foreground mode...<Ctrl-C> or or SIG_TERM to end it.
libbpf: loading example.bpf.o
..
A perceptive reader might have already made the question: Okay, isn't that too expensive ? What about the size of the BTF files ? How many external BTF files my project needs to be fully portable ? What if I need to package my eBPF application, would I have to include ALL existing BTF files from the hub to have the binary being able to run in any exiting kernel ?
That brings us back to the Linux Plumbers 2021 presentation: Towards truly portable eBPF, but, now, specifically at the end of it: the discussion about next steps.
And, yes, that was the biggest problem and one of the main reasons why btfgen, being detailed explained in this document, was created.
[user@aquasec-btfhub]$ ls -1 ./ubuntu/bionic/x86_64 | head -10
4.15.0-20-generic.btf.tar.xz
4.15.0-22-generic.btf.tar.xz
4.15.0-23-generic.btf.tar.xz
4.15.0-24-generic.btf.tar.xz
4.15.0-29-generic.btf.tar.xz
4.15.0-30-generic.btf.tar.xz
4.15.0-32-generic.btf.tar.xz
4.15.0-33-generic.btf.tar.xz
4.15.0-34-generic.btf.tar.xz
4.15.0-36-generic.btf.tar.xz
[user@aquasec-btfhub]$ du -sh .
1.5G .
The entire BTFHUB nowadays has 1.5GB of compressed BTF files (1MB average). Unfortunately including all those files in a packaged eBPF based software is undoable. Our idea, discussed in the Linux Plumbers conference, was to minimize each existing BTF file size by making them to contain ONLY the relocations needed for a certain eBPF application.
If that is not clear: instead of having ALL the kernel types, each external BTF file would contain only those types that are REALLY needed for libbpf to calculate the eBPF relocations during the code execution.
After our presentation, Mauricio, from Kinvolk/Microsoft, has contacted us, Aqua Security Open Source Team, to tell he had already started in that idea and inviting us to collaborate with the amazing work he had already put together.
The results from btfgen, reducing the size needs to create a portable eBPF application, will be presented in the next sections.
But, first, let's understand BPF relocations.
Quotes within this session are from Andrii's blog (bpf-core-reference-guide).
Now that you're familiarized with how BTF information is organized we can continue in our quest to understand BPF LLVM relocations. For eBPF objects, libbpf is the linker acting like a dynamic linker. With BTF information from both sides: the eBPF object and the target kernel, libbpf solves all the relocations before loading the eBPF object into the kernel.
Andrii has added CO-RE (Compile Once - Run Everywhere) support in libbpf in this patchset. As presented in 2019, BPF-CORE overview is a sum of:
- Self-describing kernel (BTF)
- Clang w/ emitted relocations (__builtin_preserve_access_index() feature)
- libbpf as relocating loader
As this document was being made, Andrii created an incredible reference to eBPF CO-RE that you should visit and read together with this document. It will improve your understanding of eBPF CO-RE feature and help putting together the idea of why BTF generator was created.
So, with CO-RE, the same eBPF object can run in multiple target kernels without recompilation.
Like regular ELF relocations, eBPF bytecode also needs relocations to TEXT (instructions) and DATA segments to be done before eBPF bytecode can be JIT'ed by the in-kernel BPF JIT VM and finally executed as instructions of the architecture you're running your kernel on.
There are different kinds of relocations currently supported by libbpf:
- Local Relocations
- Field Based Relocations
- Type Based Relocations
- ENUM Value Based Relocations
Do not confuse different kinds of relocations being said here with different types of relocations supported by an architecture (for x64 arch:
R_X86_64_NONE
,R_X86_64_64
,R_X86_64_PC32
, so on).
Local relocations are inherent of how the compiler/architecture works. Those relocations are the ones dealing with global variables (including eBPF MAPs) and function symbol names, for example. They are the ones using different types of relocations supported by the architecture (and not different kinds of relocations supported by libbpf).
eBPF architecture supports the following 6 relocation types:
Enum ELF Reloc Type Description BitSize Offset Calculation
0 R_BPF_NONE None
1 R_BPF_64_64 ld_imm64 insn 32 r_offset + 4 S + A
2 R_BPF_64_ABS64 normal data 64 r_offset S + A
3 R_BPF_64_ABS32 normal data 32 r_offset S + A
4 R_BPF_64_NODYLD32 .BTF[.ext] data 32 r_offset S + A
10 R_BPF_64_32 call insn 32 r_offset + 4 (S + A) / 8 - 1
and those relocations types will be used on eBPF relocation tables to instruct JIT compiler on how to relocate local (to the object) types.
In our BTFHUB code example, we can see these local relocations by doing:
$ llvm-readelf -r ./example.bpf.o
Relocation section '.relkprobe/do_sys_openat2' at offset 0x6f88 contains 1 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000290 0000000500000001 R_BPF_64_64 0000000000000000 events
Relocation section '.rel.BTF' at offset 0x6f98 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000003a40 0000000300000000 R_BPF_NONE 0000000000000000 LICENSE
0000000000003a58 0000000500000000 R_BPF_NONE 0000000000000000 events
Relocation section '.rel.BTF.ext' at offset 0x6fb8 contains 42 entries:
Offset Info Type Symbol's Value Symbol's Name
000000000000002c 0000000200000000 R_BPF_NONE 0000000000000000 kprobe/do_sys_openat2
0000000000000040 0000000200000000 R_BPF_NONE 0000000000000000 kprobe/do_sys_openat2
0000000000000050 0000000200000000 R_BPF_NONE 0000000000000000 kprobe/do_sys_openat2
0000000000000060 0000000200000000 R_BPF_NONE 0000000000000000 kprobe/do_sys_openat2
...
This is very similar to a regular ELF load-time/link-time relocations that can either be solved during compilation OR runtime, by a linker when reading the symbols tables:
$ llvm-readelf --symbols ./example.bpf.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000000002c8 0 NOTYPE LOCAL DEFAULT 2 LBB0_2
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 kprobe/do_sys_openat2
3: 0000000000000000 4 OBJECT GLOBAL DEFAULT 4 LICENSE
4: 0000000000000000 728 FUNC GLOBAL DEFAULT 2 do_sys_openat2
5: 0000000000000000 20 OBJECT GLOBAL DEFAULT 3 events
The difference is that, in a regular ELF file all the relocations are usually placed in ELF section .rela.text, while in the eBPF object ELF file they are placed in different sections (with names starting with .relXXXX).
After we talked about local relocations it is good to clarify that the subsequent relocation kinds are specific to eBPF, using BTF information, and they're done by libbpf before loading the eBPF object. They are load-time relocations with peculiarities for eBPF object. By traversing the eBPF ELF object, instructions & data can be changed by those relocations, depending on its kind.
After local relocations are done, the eBPF object isn't ready yet to be loaded. That happens because there are other relocations to be solved: field-type based relocations. Programmer will explicit enumerate them in the source code. There are MACROs to help the use of eBPF helper functions with builtin_preserve_access_index feature.
When compiling CO-RE (Compile Once - Run Everywhere) BPF architecture objects, LLVM BPF backend records each relocation in an ELF structure containing only relocation information. Those structures are placed in the correspondent RELO section (.BTF.ext) so they can be used by libbpf during load time.
This is only possible thanks to this feature called builtin_preserve_access_index (used by bpf_core_read() helper function). By using this keyword, when accessing a kernel pointer, you are instructing LLVM to keep the relocation information into the generated ELF file so the kernels where the BPF object will run knows how to relocate the symbols
Example of a header file:
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
struct task_struct {
pid_t pid;
pid_t tgid;
}
And how to use it and force LLVM to save relocation information:
pid_t pid = __builtin_preserve_access_index(({ task->pid; }));
Basic idea is this: you tell the compiler the types and fields you want to access through the helper function. It will use the internal (to LLVM) feature to keep track of everything that can be relocated (as BTF information) so, whenever the generated object is loaded, it solves relocations before running the code.
One of the very common problems BPF applications must deal with is the need to perform feature detection. I.e., detecting if a particular host kernel supports some new and optional feature, which BPF application can use to get more information or improve the efficiency.
Type based relocations are meant for the eBPF programs to discover more about the running environment. With this type of relocation, the running eBPF code is able to:
- BPF_TYPE_ID_LOCAL - get BTF type ID of specified type using local BTF information.
- BPF_TYPE_ID_TARGET - get BTF type id of specified type using target BTF information.
- BPF_TYPE_EXISTS - Check if provided named type (struct/union/enum/typedef) exists in target.
- BPF_TYPE_SIZE - Get the byte size of a provided named type (struct/union/enum/typedef) in a target kernel.
BTF generator currently does not support this kind of relocations.
One interesting challenge that some BPF applications run into is the need to work with "unstable" internal kernel enums. That is, enums which don't have a fixed set of constants and/or integer values assigned to them.
Enum based relocations are meant to allow eBPF programs to get the exact ENUM value from the running kernel (through an ebpf helper function).
BTF generator currently does not support this kind of relocations.
It's not unusual for some fields to be missing on some kernels. If a BPF program attempts a to read a missing field with
BPF_CORE_READ()
, it will result in an error during BPF verification. Similarly, CO-RE relocations will fail when getting enum value (or type size) of an enumerator (or a type) that doesn't exist in the host kernel.
Like said previously, libbpf will poison instruction containing bad relocation whenever relocation can't be done during load time. The result will be something like:
1: (85) call unknown#195896080
invalid func unknown#195896080
That 195896080 is 0xbad2310 in hex (for "bad relo") and is a constant that libbpf uses to mark instructions that failed CO-RE relocation. The reason libbpf doesn't just report such errors immediately is because missing field/type/enum and corresponding failing CO-RE relocation can be handled by the BPF application gracefully, if desired. This makes it possible to accommodate very drastic changes in kernel types with just a single BPF application (which is a crucial goal of "Compile Once – Run Everywhere" philosophy).
With BPF CO-RE relocations there are always two BTF types involved. One is the BPF program's local expectation of the type definition (e.g., vmlinux.h types or types defined manually with preserve_access_index attribute). This local BTF type provides the means for libbpf to know what to search for in the kernel BTF. As such, it can be a minimal definition of the type/field/enum with only a necessary subset of fields and enumerators.
Libbpf then can use local BTF type definition to find a matching actual complete kernel BTF type. The above helpers allow capturing BTF type IDs for both types involved in a CO-RE relocation. They could be useful for distinguishing different kernel or local types at runtime, for debugging and logging purposes, or potentially for future BPF APIs that would accept BTF type IDs as input arguments. Such APIs don't exist yet, but they are coming for sure soon.
Here I'd like to point important ideas of how libbpf does the relocations. This will help reader to understand how BTF generator was implemented using the logic already existent in libbpf.
When loading an eBPF object, the execution path until the CO-RE relocation logic is done is:
bpf_prog_load_xattr()
or bpf_object__load_skeleton()
bpf_object__load_xattr()
bpf_object__relocate()
bpf_object__relocate_core()
bpf_core_apply_relo()
The function bpf_object__relocate_core()
is the one responsible for walking the .BTF.ext
section and apply the relocations that were flagged by LLVM compiler within the eBPF object's BTF data.
The basic relocation information (bpf_core_relo
) is:
insn_off
: instruction offset (bytes) within a BPF program that needs its insn->imm field to be relocated.type_id
: BTF type ID of the root entity of a relocatable type or field.access_str_off
: offset of .BTF string section (string interpretation depends on the relocation kind: field-based, type-based or enum value-based).
This information is crucial to the BTF generator logic, and this will be showed shortly.
bpf_object__relocate_core()
will try to execute each relocation that was placed into .BTF.ext
, one by one. It even has a loop for each BTF.ext
CO-RE relocation structure => apply the relocation for the given instruction.
This takes us to the second most important function (or couple of functions) to understand BTF generator: bpf_core_apply_relo()
and its sister function bpf_core_apply_relo_insn()
. Both are called with the bpf_core_relo
structure given as an argument.
Remember: local = the eBPF object you will load into the kernel target = the kernel we are trying to load the eBPF into
The first function, bpf_core_apply_relo()
will get the local type ID and size and initiate a cache for target types that could satisfy given relocation. This cache is called cand_cache
(candidates cache).
The second function, bpf_core_apply_relo_insn()
will take care of the following:
-
Turn
bpf_core_relo
into low-level and high-level representation of a speculation and keep this named aslocal_spec
. -
Check each relocation candidate, from
cand_cache
if they really satisfy the relocation needs AND, if they do, generate a low and a high-level representation of the target speculation, named astarg_spec
. -
Call
bpf_core_calc_relo()
function to calculate the relocation for a givenlocal_spec
(local speculation) and atarg_spec
(target speculation). Depending on the type of relocation being worked with it will call the appropriate handling function:bpf_core_calc_field_relo()
bpf_core_calc_type_relo()
bpf_core_calc_enumval_relo()
and this will result into a structure called
targ_res
, containing the target resolution for the local and target speculations. Abpf_core_relo_res
, that represents the resolution of the 2bpf_core_relo
, contains, among other values, the following: -
Patch the instruction (
bpf_core_patch_insn()
) related to the relocation given by usinglocal_spec
ANDtarg_res
information.
By now the reader has a clear (hopefully) picture of:
- eBPF CO-RE and how BTF information is generated and used
- BTF information structures: how types are linked to each other
- eBPF relocation kinds: field, type and enum-val based.
- eBPF relocation speculation based on local (eBPF object) and target (kernel) BTF information.
Now, let's stop a bit and re-think about BTFHUB and its biggest issue: size!
The main idea is:
- You have an eBPF program and it can run in multiple recent kernels (local BTF and relocations).
- You have BTFHUB with tons of BTFs for old kernels (target BTFs)
Why not to "filter in/out" the types being used by your eBPF object and only keep those that are interesting for you in the target BTF (the ones representing the kernels). This way, to fully support old kernels, you don't need a 1.5MB file to each of them.
When Mauricio (Kinvolk/Microsoft) approached us with his proof-of-concept code, he had already solved a problem we were about to start dealing with. Our main intent was to get a TARGET BTF file and make it small by containing only types being used by OUR ebpf object.
Unfortunately, simply getting existing BTF types from the local BTF and trying to create a target BTF with those won't work. You need to calculate relocations first and generate a target BTF with the result of those relocations. This will make sure that all types you need at the end (when running in the old kernel that does not have a BTF) exist.
In summary, by receiving a range of external BTF files, for different kernels, and a range of eBPF object files, from different eBPF based applications, as arguments, BTF generator is responsible for:
- Calculate all relocations from eBPF obj file to each existing kernel BTF files
- Generate partial BTF files to each existing/given kernel containing only types being used by an eBPF object (this way one can distribute an application with a bundle of BTF files and make it to support all old kernels for multiple distros).
How BTF generator does this ?
- By patching libbpf to create specific BTF generation code (based on relocations)
- By patching bpftool to create specific sub-function to generate BTF files
Libbpf already provides functions to easily manipulate BTF information. If you think about how BTF types are organized, based on previous picture examples, you will see that creating a BTF file is simply a question of creating an empty BTF (btf__new_empty()
) information structure and add BTF types to it.
There are different ways to add a BTF type to an existing empty BTF structure. You might chose to either add the type through specific BTF type kind functions:
btf__add_int()
btf__add_float()
btf__add_ref_kind()
(PTR, TYPEDEF, CONST/VOLATILE/RESTRICT)btf__add_ptr()
btf__add_array()
btf__add_composite()
(STRUCT/UNION by providing existing fields)btf__add_struct()
(STRUCT with no fields)btf__add_union()
(UNION with no fields)btf__add_enum()
(ENUM with no enum values)btf__add_fwd()
(FWD declaration to STRUCT, UNION or ENUM)btf__add_typedef()
btf__add_volatile()
btf__add_const()
btf__add_restrict()
btf__add_func()
btf__add_func_proto()
(FUNCTION prototype with no arguments)btf__add_var()
btf__add_datasec()
and by populating those with:
btf__add_field()
(STRUCT/UNION new field)btf__add_enum_value()
(ENUM new value)btf__add_func_param()
(FUNCTION arguments)btf__add_datasec_var_info()
But there is also a generic way of adding types to a BTF in-memory structure:
btf__add_type()
Like said previously, we need to construct a BTF file containing only the types that are result from the eBPF relocation. We have everything we need right before libbpf applies the relocation to the instruction:
- A
bpf_core_relo
that represents 1 relocation from the origin (the eBPF object) =local_spec
- A
bpf_core_relo
that represents 1 type candidate to match the relocation type =targ_spec
- A
bpf_core_relo_res
that represents the resolution of the 2bpf_core_relo
.
All we need is to walk the relocation, type by type, member/field by member/field, and add found types - and field/member relationships - to a recently in-memory created BTF file. This way, at the end, our resulted BTF file will be a small subset of the original big BTF file. Exactly what we wanted to solve BTFHUB sizing issue.
So, let's do this, let's walk all the relocations. To each relocation, let's walk the BTF types being represented by the relocation. And let's understand BTF generator internals.
By starting the BTF generator tool with debug messages, having an external BTF file for kernel 5.4.0-87 (Ubuntu Bionic), to a complex eBPF object (Tracee), we will see a list of all the relocations calculated from the given eBPF object for this object to run in a 5.4.0-87 kernel:
RELOCATION: [26219] struct bpf_raw_tracepoint_args.args[1] (0:0:1 @ offset 8)
RELOCATION: [28233] struct task_struct.real_parent (0:68 @ offset 2256)
RELOCATION: [28233] struct task_struct.pid (0:65 @ offset 2240)
RELOCATION: [28233] struct task_struct.nsproxy (0:105 @ offset 2760)
RELOCATION: [28421] struct nsproxy.pid_ns_for_children (0:4 @ offset 32)
RELOCATION: [28409] struct pid_namespace.level (0:6 @ offset 72)
RELOCATION: [28233] struct task_struct.thread_pid (0:75 @ offset 2344)
RELOCATION: [28411] struct pid.numbers (0:5 @ offset 80)
RELOCATION: [28408] struct upid.nr (0:0 @ offset 0)
RELOCATION: [28421] struct nsproxy.pid_ns_for_children (0:4 @ offset 32)
...
RELOCATION: [198] struct pt_regs.di (0:14 @ offset 112)
RELOCATION: [198] struct pt_regs.si (0:13 @ offset 104)
RELOCATION: [3484] struct socket.sk (0:4 @ offset 24)
RELOCATION: [3012] struct sock.__sk_common.skc_family (0:0:3 @ offset 16)
RELOCATION: [47942] struct inet_sock.sk.__sk_common.skc_rcv_saddr (0:0:0:0:1:1 @ offset 4)
RELOCATION: [47942] struct inet_sock.sk.__sk_common.skc_num (0:0:0:2:1:1 @ offset 14)
RELOCATION: [47942] struct inet_sock.sk.__sk_common.skc_daddr (0:0:0:0:1:0 @ offset 0)
RELOCATION: [47942] struct inet_sock.sk.__sk_common.skc_dport (0:0:0:2:1:0 @ offset 12)
RELOCATION: [47875] struct sockaddr_in.sin_family (0:0 @ offset 0)
RELOCATION: [47875] struct sockaddr_in.sin_port (0:1 @ offset 2)
RELOCATION: [47875] struct sockaddr_in.sin_addr.s_addr (0:2:0 @ offset 4)
RELOCATION: [49718] struct unix_sock.addr (0:1 @ offset 760)
RELOCATION: [49716] struct unix_address.len (0:1 @ offset 4)
RELOCATION: [49716] struct unix_address.name (0:3 @ offset 12)
RELOCATION: [3012] struct sock.__sk_common.skc_state (0:0:4 @ offset 18)
RELOCATION: [47942] struct inet_sock.pinet6 (0:1 @ offset 760)
...
Picking one relocation as example:
RELOCATION: [47942] struct inet_sock.sk.__sk_common.skc_num (0:0:0:2:1:1 @ offset 14)
We can see that this relocation happens because of struct inet_sock
. This is the root entity of the relocation. All the rest are either members or fields for the relocation. The struct inet_sock
is a BTF_TYPE
with id == 47942.
By executing bpftool we're able to follow that:
$ bpftool btf dump file ./btfs/5.4.0-87-generic.btf format raw
...
[47942] STRUCT 'inet_sock' size=968 vlen=30
'sk' type_id=3012 bits_offset=0
'pinet6' type_id=47944 bits_offset=6080
'inet_saddr' type_id=2996 bits_offset=6144
'uc_ttl' type_id=16 bits_offset=6176
'cmsg_flags' type_id=18 bits_offset=6192
'inet_sport' type_id=2995 bits_offset=6208
'inet_id' type_id=18 bits_offset=6224
'inet_opt' type_id=47938 bits_offset=6272
'rx_dst_ifindex' type_id=21 bits_offset=6336
'tos' type_id=13 bits_offset=6368
'min_ttl' type_id=13 bits_offset=6376
'mc_ttl' type_id=13 bits_offset=6384
'pmtudisc' type_id=13 bits_offset=6392
'recverr' type_id=13 bits_offset=6400 bitfield_size=1
'is_icsk' type_id=13 bits_offset=6401 bitfield_size=1
'freebind' type_id=13 bits_offset=6402 bitfield_size=1
'hdrincl' type_id=13 bits_offset=6403 bitfield_size=1
'mc_loop' type_id=13 bits_offset=6404 bitfield_size=1
'transparent' type_id=13 bits_offset=6405 bitfield_size=1
'mc_all' type_id=13 bits_offset=6406 bitfield_size=1
'nodefrag' type_id=13 bits_offset=6407 bitfield_size=1
'bind_address_no_port' type_id=13 bits_offset=6408 bitfield_size=1
'defer_connect' type_id=13 bits_offset=6409 bitfield_size=1
'rcv_tos' type_id=13 bits_offset=6416
'convert_csum' type_id=13 bits_offset=6424
'uc_index' type_id=21 bits_offset=6432
'mc_index' type_id=21 bits_offset=6464
'mc_addr' type_id=2996 bits_offset=6496
'mc_list' type_id=47945 bits_offset=6528
'cork' type_id=47941 bits_offset=6592
...
And this is the moment you realize the external BTF file 5.4.0-87-generic.btf
is HUGE as it contains all types used by the kernel image. Let's continue. In our relocation we had sk
string as the member of struct inet_sock
. In the BTF dump we can find the member sk
and check which BTF_TYPE id it points to:
'sk' type_id=3012 bits_offset=0
Continuing, now we must find BTF type id == 3012:
[3012] STRUCT 'sock' size=760 vlen=88
'__sk_common' type_id=4507 bits_offset=0
'sk_lock' type_id=4492 bits_offset=1088
'sk_drops' type_id=79 bits_offset=1344
'sk_rcvlowat' type_id=21 bits_offset=1376
'sk_error_queue' type_id=3582 bits_offset=1408
'sk_rx_skb_cache' type_id=3247 bits_offset=1600
'sk_receive_queue' type_id=3582 bits_offset=1664
'sk_backlog' type_id=4510 bits_offset=1856
'sk_forward_alloc' type_id=21 bits_offset=2048
'sk_ll_usec' type_id=9 bits_offset=2080
'sk_napi_id' type_id=9 bits_offset=2112
'sk_rcvbuf' type_id=21 bits_offset=2144
'sk_filter' type_id=4514 bits_offset=2176
'(anon)' type_id=4511 bits_offset=2240
'sk_policy' type_id=4515 bits_offset=2304
'sk_rx_dst' type_id=3382 bits_offset=2432
'sk_dst_cache' type_id=3382 bits_offset=2496
'sk_omem_alloc' type_id=79 bits_offset=2560
'sk_sndbuf' type_id=21 bits_offset=2592
'sk_wmem_queued' type_id=21 bits_offset=2624
'sk_wmem_alloc' type_id=790 bits_offset=2656
...
And it's a struct sock
. So, we currently know we had a relocation for a struct inet_sock
->struct sock
->XXX. Now, the 3rd member of this relocation is __sk_common
and we can find it right in the beginning of the struct sock
BTF information:
'__sk_common' type_id=4507 bits_offset=0
The BTF type id 4507 is:
[4507] STRUCT 'sock_common' size=136 vlen=25
'(anon)' type_id=4496 bits_offset=0
'(anon)' type_id=4497 bits_offset=64
'(anon)' type_id=4500 bits_offset=96
'skc_family' type_id=19 bits_offset=128
'skc_state' type_id=2991 bits_offset=144
'skc_reuse' type_id=14 bits_offset=152 bitfield_size=4
'skc_reuseport' type_id=14 bits_offset=156 bitfield_size=1
'skc_ipv6only' type_id=14 bits_offset=157 bitfield_size=1
'skc_net_refcnt' type_id=14 bits_offset=158 bitfield_size=1
'skc_bound_dev_if' type_id=21 bits_offset=160
'(anon)' type_id=4501 bits_offset=192
'skc_prot' type_id=4509 bits_offset=320
'skc_net' type_id=3613 bits_offset=384
'skc_v6_daddr' type_id=3289 bits_offset=448
'skc_v6_rcv_saddr' type_id=3289 bits_offset=576
'skc_cookie' type_id=81 bits_offset=704
'(anon)' type_id=4502 bits_offset=768
'skc_dontcopy_begin' type_id=106 bits_offset=832
'(anon)' type_id=4504 bits_offset=832
'skc_tx_queue_mapping' type_id=19 bits_offset=960
'skc_rx_queue_mapping' type_id=19 bits_offset=976
'(anon)' type_id=4505 bits_offset=992
'skc_refcnt' type_id=790 bits_offset=1024
'skc_dontcopy_end' type_id=106 bits_offset=1056
'(anon)' type_id=4506 bits_offset=1056
By now our relocation inet_sock.sk.__sk_common.skc_num (0:0:0:2:1:1 @ offset 14)
only had structs as members. The next field is skc_num
but there is a catch. We won't find skc_num
as a member of the type 4507. That happens because that is an anonymous type. Now its time to pay attention to the fields, not only the member types. In the relocation
inet_sock.sk.__sk_common.skc_num (0:0:0:**2:1:1** @ offset 14)
we have numbers at the end that tells us what fields to use if they're unnamed (which is the case here). The 4th member has field #2, which is:
'(anon)' type_id=4500 bits_offset=96
Checking the BTF type id it points to:
[4500] UNION '(anon)' size=4 vlen=2
'skc_portpair' type_id=4493 bits_offset=0
'(anon)' type_id=4499 bits_offset=0
We will also must rely in the 5th member field #1 to know the relocation field:
'(anon)' type_id=4499 bits_offset=0
And checking BTF type id 4499, also using the 6th member field #1:
[4499] STRUCT '(anon)' size=4 vlen=2
'skc_dport' type_id=2995 bits_offset=0
'skc_num' type_id=18 bits_offset=16
We will find the member named skc_num
.
So, as showed, to each given relocation we must use contained information, of types, root entities, members, and fields, to construct another BTF file with the resulted relocations. This another BTF file will be a subset of the given external BTF file for kernel 5.4.0-87, but it will only contain the BTF types we need. This way we can use this generated BTF as an input of being an external BTF file to libbpf and load our eBPF program into a 5.4.0-87 kernel with a very small external BTF file (no need to have the big one). Of course, this external BTF file is tailor made to this eBPF object and other eBPF objects won't work. Idea is exactly that: your eBPF application can generate a bundle of BTF files representing all supported kernels together with its binaries, and be able to run your code EVERYWHERE (as CO-RE means Run Everywhere
BTF generator organizes all its information in three structs called btf_reloc_info
and btf_reloc_type
and btf_reloc_member
. It also uses already existent btf_type
and btf_member
structures.
Look on how the data is organized:
A single BTF_RELOC_INFO
structure is created, and it contains:
SRC_BTF
: a pointer to the the source (eBPF object) BTF fileTYPES
: a hashmap of allBTF_RELOC_TYPE
structuresIDS_MAP
: a hashmap containing a NEW TYPE ID value to each existent OLD TYPE ID value.
Focusing into BTF_RELOC_TYPE
for now, it contains:
BTF_TYPE
: A ptr to the BTF type of the root entity of a relocation (inet_sock
from the previous example). It can be any BTF type kind (STRUCT/UNION, INT, FLOAT, PTR, ...).ID
: the BTF type id of the root entity of the relocation (47942 from the previous example)MEMBERS
: a hashmap of allBTF_RELOC_MEMBERS
(one per existing BTF type member)
So, if the BTF_RELOC_TYPE
represents a UNION or a STRUCT, we will have BTF_RELOC_MEMBER
to each existing member within the relocation. The BTF_RELOC_MEMBER
contains:
- A pointer to
BTF_MEMBER
structure representing the member from that relocation.
At the end we have: All relocations are resolved so each type, from each field or member of that relocation, is added as a new type of the final BTF. Each existing type has its own BTF_RELOC_TYPE
structure that can contain, or not, BTF_RELOC_MEMBER
s.
It is important to keep the "root entity type" => "members" relationship because that is what will give us the final BTF graph. If the types are simple, then no members or any other structure is needed to be appended to the root entity BTF type. If the types are complex, then we would must initially add the types, and then add a field/member, one by one, to each complex type (like structs and unions) added.
Gladly libbpf allows us to add a complex BTF type with members already (through btf__add_type()
). This is what function bpf_reloc_info__get_btf()
does at its first pass through all existing types from BTF_RELOC_INFO
structure.
Unfortunately, whenever we add a BTF type to a new BTF file it also gets a new BTF type ID. This means that the relationship between the BTF types (root entities) and existing fields and members are broken. Check it out:
$ bpftool btf dump file ./generated/5.4.0-87-generic.btf format raw
[1] PTR '(anon)' type_id=28278
[2] TYPEDEF 'u32' type_id=23
[3] TYPEDEF '__be16' type_id=18
[4] PTR '(anon)' type_id=47943
[5] TYPEDEF '__u8' type_id=14
[6] PTR '(anon)' type_id=49716
[7] STRUCT 'mnt_namespace' size=120 vlen=1
'ns' type_id=1949 bits_offset=64
[8] TYPEDEF '__kernel_gid32_t' type_id=9
[9] STRUCT 'iovec' size=16 vlen=2
'iov_base' type_id=103 bits_offset=0
'iov_len' type_id=48 bits_offset=64
[10] PTR '(anon)' type_id=28881
[11] STRUCT '(anon)' size=8 vlen=2
'skc_daddr' type_id=2996 bits_offset=0
'skc_rcv_saddr' type_id=2996 bits_offset=32
[12] TYPEDEF '__u64' type_id=27
[13] PTR '(anon)' type_id=36422
[14] TYPEDEF 'pid_t' type_id=45
[15] PTR '(anon)' type_id=28284
[16] PTR '(anon)' type_id=8
...
So, despite having all BTF types being used by our eBPF object, the complex types point to non-existent types. For example:
'skc_daddr' type_id=2996 bits_offset=0
There is no such BTF type id == 2996 in the generated BTF file. That is the reason why, in the data organization picture you will find a hashmap for OLD and NEW TYPE IDs. All BTF type ids being pointed to are fixed by using this hashmap of OLD and NEW TYPE IDs and this is done by the second pass of function bpf_reloc_info__get_btf()
.
Look how the generated file look like after this second pass:
[1] PTR '(anon)' type_id=97
[2] TYPEDEF 'u32' type_id=35
[3] TYPEDEF '__be16' type_id=22
[4] PTR '(anon)' type_id=51
[5] TYPEDEF '__u8' type_id=82
[6] PTR '(anon)' type_id=29
[7] STRUCT 'mnt_namespace' size=120 vlen=1
'ns' type_id=71 bits_offset=64
[8] TYPEDEF '__kernel_gid32_t' type_id=74
[9] STRUCT 'iovec' size=16 vlen=2
'iov_base' type_id=16 bits_offset=0
'iov_len' type_id=84 bits_offset=64
[10] PTR '(anon)' type_id=57
[11] STRUCT '(anon)' size=8 vlen=2
'skc_daddr' type_id=80 bits_offset=0
'skc_rcv_saddr' type_id=80 bits_offset=32
[12] TYPEDEF '__u64' type_id=87
[13] PTR '(anon)' type_id=7
[14] TYPEDEF 'pid_t' type_id=104
[15] PTR '(anon)' type_id=62
[16] PTR '(anon)' type_id=0
...
You will realize that the same example is now fixed and pointing to the correct BTF type id:
'skc_daddr' type_id=80 bits_offset=0
BTF type id == 80 is:
[80] TYPEDEF '__be32' type_id=35
And it points to btf type id == 35:
[35] TYPEDEF '__u32' type_id=74
Which points to type id 74:
[74] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
Which is a simple type and does not need to point anywhere.
The reader can opt to use BTFHUB in different ways:
-
To download an ENTIRE BTF file for a specific kernel version and use it as an external BTF file when loading your app using libbpf. Examples:
To each different kernel you will have to download the correspondent BTF file available in BTFHUB. This is the old, problematic (big), way of doing CO-RE for kernels that don't support BTF.
or
-
To clone BTFHUB and use btfgen to generate ALL BTF files for the kernels you would like to support for your app. Example:
[user@host:~/.../aquasec-btfhub/tools][main]$ ./btfgen.sh ~/aquasec-tracee/tracee-ebpf/dist/tracee.bpf.core.o
If you visit the tools directory within BTFHUB you will see instructions on how to use a non-upstreamed (and statically compiled version) of btfgen (the BTF generator).
btfgen.sh script will generate multiple BTF files and symlinks at tools/output/{centos,fedora,ubuntu}/
containing ONLY the types being used by the given eBPF object (tracee.bpf.core.o
in this example), with the relocations for each specific kernel version already recalculated.
Then you can execute your application by loading the correspondent - to your running kernel - partial generated BTF file:
Example:
$ uname -a
Linux bionic 5.4.0-87-generic #98~18.04.1-Ubuntu SMP Wed Sep 22 10:45:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
$ bpftool btf dump file ./generated/5.4.0-87-generic.btf format raw | grep "^\[" | wc -l
122
[user@host:~/.../aquasec-tracee/tracee-ebpf]$ sudo TRACEE_BTF_FILE=~/aquasec-btfhub/tools/output/ubuntu/18.04/x86_64/5.4.0-87-generic.btf ./dist/tracee-ebpf --debug --trace event=execve,execveat,uname
OSInfo: VERSION: "18.04.6 LTS (Bionic Beaver)"
OSInfo: ID: ubuntu
OSInfo: ID_LIKE: debian
OSInfo: PRETTY_NAME: "Ubuntu 18.04.6 LTS"
OSInfo: VERSION_ID: "18.04"
OSInfo: VERSION_CODENAME: bionic
OSInfo: KERNEL_RELEASE: 5.4.0-87-generic
BTF: bpfenv = false, btfenv = true, vmlinux = false
BPF: using embedded BPF object
BTF: using BTF file from environment: ~/aquasec-btfhub/tools/output/ubuntu/18.04/x86_64/5.4.0-87-generic.btf
unpacked CO:RE bpf object file into memory
TIME UID COMM PID TID RET EVENT ARGS
05:08:45:175699 1000 bash 5176 5176 0 execve pathname: /bin/ls, argv: [ls --color=auto]
05:08:45:188780 1000 bash 5180 5180 0 execve pathname: /usr/bin/git, argv: [git branch]
05:08:45:189986 1000 bash 5181 5181 0 execve pathname: /bin/sed, argv: [sed -e /^[^*]/d -e s/* \(.*\)/\1/]
05:08:45:971635 1000 bash 5183 5183 0 execve pathname: /bin/ps, argv: [ps]
05:08:46:015186 1000 bash 5186 5186 0 execve pathname: /usr/bin/git, argv: [git branch]
05:08:46:015415 1000 bash 5187 5187 0 execve pathname: /bin/sed, argv: [sed -e /^[^*]/d -e s/* \(.*\)/\1/]
End of events stream
Stats: {EventCount:6 ErrorCount:0 LostEvCount:0 LostWrCount:0 LostNtCount:0}
Here you will find some other sources of information about eBPF CO-RE:
- https://ebpf.io/what-is-ebpf
- https://github.com/libbpf/libbpf
- https://nakryiko.com/posts/bpf-portability-and-co-re/
- https://nakryiko.com/posts/bpf-portability-and-co-re/#btf
Other great links that might be worth reading are:
- Introduction to eBPF (from: ebpf.io/what-is-ebpf)
- Development Toolchains (from: ebpf.io/what-is-ebpf)
- BCC to libbpf conversion guide (if you're coming from BCC)
- Building BPF applications with libbpf-bootstrap
- BPF Design FAQ
- eBPF features per kernel version
- BTFHUB code example
- BCC's libbpf-tools directory
Now, some links about formats, eBPF and BTF internals: