Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't instantiate NetObserv eBPF Agent on kernel 5.4 #369

Open
matijavizintin opened this issue Jul 22, 2024 · 10 comments
Open

can't instantiate NetObserv eBPF Agent on kernel 5.4 #369

matijavizintin opened this issue Jul 22, 2024 · 10 comments
Assignees

Comments

@matijavizintin
Copy link

I'm trying to run it on kernel 5.4 (Ubuntu 20, 5.4.0-189-generic) and it fails with
FATA[0000] can't instantiate NetObserv eBPF Agent error="loading and assigning BPF objects: field EgressFlowParse: program egress_flow_parse: map direct_flows: map create without BTF: invalid argument"

Is there any way to make it work because upgrading all the servers is not realistic atm.

@matijavizintin
Copy link
Author

Well, it turns out is not that hard. I had to replace the RingBuffer with PerfEvent Array in the ebpf code, remove the fentry which is not supported yet and do few changes to use the perf reader.
If anyone is interested I can share more details.

@jotak
Copy link
Member

jotak commented Jul 24, 2024

Thanks @matijavizintin for reporting the issue. We indeed test & validate the agent on more recent kernel versions, sorry for the quirks you had with an older one.

cc @msherif1234
I'm actually wondering why we claim supporting kernels 4.18+ (this is in the readme) as the ringbuffer was only added in 5.8, and iirc the very first version of the agent was already using the ringbuffer. Maybe some historical knowledge that we lose... (or was there some redhat specific backports?)

We do some specific reassignments for kernels older than 5.14 but that's not related to the ring buffer.

@matijavizintin , would your changes be straightforward to integrate upstream?

On the fEntry, a quick look suggests that we fall back to using kprobe when they fail, is there something missing here?

@matijavizintin
Copy link
Author

It was a fun one and I learned something new :)

I would say yes. I did it in a hackish way because I needed to prove that it works but in general I added a perf reader to FlowFetcher to read from BPF_MAP_TYPE_PERF_EVENT_ARRAY instead of BPF_MAP_TYPE_RINGBUF.

Regarding fEntry that's true however the code will already fail when loading TCPRcvFentry ebpf program. So I commented that out and used kprobe.

I'm attaching the patch so you can see what I did. As I said, very hackish.
support_for_older_kernels.patch

@msherif1234
Copy link
Contributor

msherif1234 commented Jul 24, 2024

we already fall back to kprobe if fentry isn't supported or available in fact I remember fentry not available for s390 arch even with recent kernel see #265

readme probably need some updates I think we need to set mini kernel version to 5.8

we can't switch our ringbug map with perf events in production as it less efficient for our application
https://nakryiko.com/posts/bpf-ringbuf/

Thanks @matijavizintin

@jotak
Copy link
Member

jotak commented Jul 24, 2024

@msherif1234 if there is a demand (upstream) to keep support for 5.4 / 5.8, etc. , we could create one or more dedicated branches. Which also allows us to clean up the main branch and get rid of the compatibility code, wdyt?

@matijavizintin
Copy link
Author

@msherif1234 That's true, however the code will already throw an error here https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L738 even before falling back https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L172 At least that's the case for x86.

Yeah, saw the performance impact in the docs, I plan to test it in prod soon since it's not feasible to upgrade all the servers. My plan is to install your latest release to Ubuntu22+ servers (kernel > 5.8) and the patched code using perf event array on the older OSes. I can report the difference in performance. I also have a plan to make a code a bit nicer than the current hackish patch :)

@msherif1234
Copy link
Contributor

msherif1234 commented Jul 25, 2024

@matijavizintin its a warning not an error https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L175 but for advanced kernel though and we do use this logic in production, for older kernel check this is a bug IMO and #374 should help will you able to see if that helps with your kernel ?
we didn't see it because the older kernel we tested with seems to have fentry support

@msherif1234
Copy link
Contributor

@msherif1234 if there is a demand (upstream) to keep support for 5.4 / 5.8, etc. , we could create one or more dedicated branches. Which also allows us to clean up the main branch and get rid of the compatibility code, wdyt?

@jotak if want to go to older kernel w/o rbuf support we might have to do a fair bit of work to replace with perf event map type which will require good amount of work in ebpf and the userspace ? but its possible if there is pressing need

@msherif1234 msherif1234 self-assigned this Jul 25, 2024
@matijavizintin
Copy link
Author

@msherif1234 sorry for the late reply. I applied your changes from #374 to my branch and it works well, thanks!

@msherif1234
Copy link
Contributor

Thank you @matijavizintin !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants