-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Fixed eBPF Feature Detection #1443
Conversation
e5a2371
to
e37a3c2
Compare
pls share the NPE, and steps to reproduce |
NPE:
As far as I can tell the NPE was caused by the following chain of events:
In which circumstances would "bpf_net_tx_irq" not exist then?
So the only way this can occur is when I'm not sure what I managed to do with the cilium/ebpf port and with this patch to make that issue worse... but there is definitely something fishy happening 🐟 Still trying to untangle this to find out what's going on. |
on which commit? does |
got it. |
That will reproduce it (for now). I'm only able to reproduce it in CI so using this PR to help debug it. |
859120e
to
a0ebf8c
Compare
This PR is ready to go. As for the root cause, I ended up hooking up a debugger and I found out that the issue was that in #1413 I'd forgotten to attach one of the perf_events - a software event called ( EDIT: It still doesn't explain why the Either way the issue is basically because we have 2 competing definitions of "what's supported by ebpf". This PR fixes that. |
645ee20
to
b79cf26
Compare
On working to migrate to cilium/ebpf I started to hit issues with nil pointer dereferences that indicated that something odd was happening between eBPF feature detection and metrics export. I discovered that there were additional global variables in many of the packages whose values are set before eBPF feature detection has happened. It appears that in these cases - through the various layers of indirection - we may try and read/write metrics that do not exist causing a nil pointer dereference. There appear to be several places where we're accessing map keys without checking they are nil - that will need to be fixed in a follow up. This PR attempts to at least ensure that from the BPF side that the keys we expect to exist, do actually exist in the maps that matter. eBPF feature detection how produces a bpf.SupportedMetrics struct, which is now consumed in various places in the codebase to ensure that the metrics we're exporting match those supported by the eBPF exporter. Signed-off-by: Dave Tucker <[email protected]>
} | ||
} | ||
|
||
if !config.ExposeHardwareCounterMetrics { | ||
klog.Infof("Hardware counter metrics are disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about we Warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
On working to migrate to cilium/ebpf I started to hit issues with
nil pointer dereferences that indicated that something odd was happening
between eBPF feature detection and metrics export.
I discovered that there were additional global variables in many of the
packages whose values are set before eBPF feature detection has
happened.
It appears that in these cases - through the various layers of
indirection - we may try and read/write metrics that do not exist
causing a nil pointer dereference.
There appear to be several places where we're accessing map keys
without checking they are nil - that will need to be fixed in a
follow up. This PR attempts to at least ensure that from the BPF
side that the keys we expect to exist, do actually exist in the
maps that matter.
eBPF feature detection how produces a bpf.SupportedMetrics
struct, which is now consumed in various places in the codebase
to ensure that the metrics we're exporting match those supported
by the eBPF exporter.