-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial fix for non-detection of OmniPath cards #387
Conversation
…eus#341) Currently these cards aren't picked up correctly because the driver doesn't provide the hca_type file the parse function expects. This patch stops the parse function treating that as an error. Signed-off-by: Ian Kirker <[email protected]>
This adds fixtures and a test for the OmniPath detection and metric reading added in a previous commit. This could use a check from someone better acquainted with the driver and relevant metrics, but it should do some approximation of the job for now. It might even be sensible to specifically detect major versions of devices, since the detailed metric file layout might even vary between mlx4 and mlx5. There's also a factor of 4 included in the Mellanox metric pickup that I'm not sure about: might need to detect device to tell whether we need that, and that's a bit more work. Signed-off-by: Ian Kirker <[email protected]>
Ah. I had to do some finicky rebasing + merging, and I think that's where the test failures come from -- I tested before but not after, foolishly. I'll fix it up. |
Whoops. Signed-off-by: Ian Kirker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Fixes InfiniBand fileset incompatibility with OmniPath cards (prometheus#341) Currently these cards aren't picked up correctly because the driver doesn't provide the hca_type file the parse function expects. This patch stops the parse function treating that as an error. Signed-off-by: Ian Kirker <[email protected]> * Adds test for OmniPath device detection This adds fixtures and a test for the OmniPath detection and metric reading added in a previous commit. This could use a check from someone better acquainted with the driver and relevant metrics, but it should do some approximation of the job for now. It might even be sensible to specifically detect major versions of devices, since the detailed metric file layout might even vary between mlx4 and mlx5. There's also a factor of 4 included in the Mellanox metric pickup that I'm not sure about: might need to detect device to tell whether we need that, and that's a bit more work. Signed-off-by: Ian Kirker <[email protected]> * Fixes some mistakes that slipped in during merge/rebase Whoops. Signed-off-by: Ian Kirker <[email protected]>
* Fixes InfiniBand fileset incompatibility with OmniPath cards (prometheus#341) Currently these cards aren't picked up correctly because the driver doesn't provide the hca_type file the parse function expects. This patch stops the parse function treating that as an error. Signed-off-by: Ian Kirker <[email protected]> * Adds test for OmniPath device detection This adds fixtures and a test for the OmniPath detection and metric reading added in a previous commit. This could use a check from someone better acquainted with the driver and relevant metrics, but it should do some approximation of the job for now. It might even be sensible to specifically detect major versions of devices, since the detailed metric file layout might even vary between mlx4 and mlx5. There's also a factor of 4 included in the Mellanox metric pickup that I'm not sure about: might need to detect device to tell whether we need that, and that's a bit more work. Signed-off-by: Ian Kirker <[email protected]> * Fixes some mistakes that slipped in during merge/rebase Whoops. Signed-off-by: Ian Kirker <[email protected]>
Mellanox drivers provide a particular file in the
/sys
pseudo-filesystem that not all InfiniBand drivers do, and OmniPath drivers don't. This means that the InfiniBand detection fails to detect these cards.This naïve fix makes OmniPath cards detectable by making the InfiniBand detection happy to ignore whether that file is missing.
It could probably use a look in future from someone with more expertise (or at least, more documentation) on the specific info provided by the driver.
This fixes #341 and I think also the downstream issue in the node_exporter: prometheus/node_exporter#2023.