Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial fix for non-detection of OmniPath cards #387

Merged
merged 3 commits into from
Jul 5, 2021

Conversation

ikirker
Copy link
Contributor

@ikirker ikirker commented Jun 17, 2021

Mellanox drivers provide a particular file in the /sys pseudo-filesystem that not all InfiniBand drivers do, and OmniPath drivers don't. This means that the InfiniBand detection fails to detect these cards.

This naïve fix makes OmniPath cards detectable by making the InfiniBand detection happy to ignore whether that file is missing.

It could probably use a look in future from someone with more expertise (or at least, more documentation) on the specific info provided by the driver.

This fixes #341 and I think also the downstream issue in the node_exporter: prometheus/node_exporter#2023.

ikirker added 2 commits June 17, 2021 00:46
…eus#341)

Currently these cards aren't picked up correctly because the driver
doesn't provide the hca_type file the parse function expects.

This patch stops the parse function treating that as an error.

Signed-off-by: Ian Kirker <[email protected]>
This adds fixtures and a test for the OmniPath detection and metric
reading added in a previous commit.

This could use a check from someone better acquainted with the driver
and relevant metrics, but it should do some approximation of the job for now.

It might even be sensible to specifically detect major versions of devices, since
the detailed metric file layout might even vary between mlx4 and mlx5.

There's also a factor of 4 included in the Mellanox metric pickup that
I'm not sure about: might need to detect device to tell whether we need
that, and that's a bit more work.

Signed-off-by: Ian Kirker <[email protected]>
@ikirker
Copy link
Contributor Author

ikirker commented Jun 17, 2021

Ah. I had to do some finicky rebasing + merging, and I think that's where the test failures come from -- I tested before but not after, foolishly. I'll fix it up.

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@discordianfish discordianfish requested a review from SuperQ July 1, 2021 09:14
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SuperQ SuperQ merged commit 6299d7d into prometheus:master Jul 5, 2021
remijouannet pushed a commit to remijouannet/procfs that referenced this pull request Oct 20, 2022
* Fixes InfiniBand fileset incompatibility with OmniPath cards (prometheus#341)

Currently these cards aren't picked up correctly because the driver
doesn't provide the hca_type file the parse function expects.

This patch stops the parse function treating that as an error.

Signed-off-by: Ian Kirker <[email protected]>

* Adds test for OmniPath device detection

This adds fixtures and a test for the OmniPath detection and metric
reading added in a previous commit.

This could use a check from someone better acquainted with the driver
and relevant metrics, but it should do some approximation of the job for now.

It might even be sensible to specifically detect major versions of devices, since
the detailed metric file layout might even vary between mlx4 and mlx5.

There's also a factor of 4 included in the Mellanox metric pickup that
I'm not sure about: might need to detect device to tell whether we need
that, and that's a bit more work.

Signed-off-by: Ian Kirker <[email protected]>

* Fixes some mistakes that slipped in during merge/rebase

Whoops.

Signed-off-by: Ian Kirker <[email protected]>
jritter pushed a commit to jritter/procfs that referenced this pull request Jul 15, 2024
* Fixes InfiniBand fileset incompatibility with OmniPath cards (prometheus#341)

Currently these cards aren't picked up correctly because the driver
doesn't provide the hca_type file the parse function expects.

This patch stops the parse function treating that as an error.

Signed-off-by: Ian Kirker <[email protected]>

* Adds test for OmniPath device detection

This adds fixtures and a test for the OmniPath detection and metric
reading added in a previous commit.

This could use a check from someone better acquainted with the driver
and relevant metrics, but it should do some approximation of the job for now.

It might even be sensible to specifically detect major versions of devices, since
the detailed metric file layout might even vary between mlx4 and mlx5.

There's also a factor of 4 included in the Mellanox metric pickup that
I'm not sure about: might need to detect device to tell whether we need
that, and that's a bit more work.

Signed-off-by: Ian Kirker <[email protected]>

* Fixes some mistakes that slipped in during merge/rebase

Whoops.

Signed-off-by: Ian Kirker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parseInfinibandDevice fails with Intel Omni-Path
3 participants