Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-19808 mlxfwreset before reboot in SR-IOV operator #8

Open
wants to merge 2 commits into
base: clark/config-daemon-ib-unbind-fix
Choose a base branch
from

Conversation

punkerpunker
Copy link

Well, overall it seems like something wrong on the firmware level enabling the SR-IOV capability, from the kernel perspective things looks okay to me

I see in the sriov-network-operator there's an option to enable mstfwreset after mlxconfig change! I guess that's something that we can try leveraging:
https://github.com/openshift/sriov-network-operator/blob/79cb3c6ae721220754189300539a38c63e38e66c/pkg/plugins/mellanox/mellanox_plugin.go#L215

I think this is going to resolve the reboot loop we're getting when running config-daemon, and I'll try doing it tomorrow.
All paths so far goes to firmware, I don't think I'll find anything more in the kernel, tbh

Infra PR to enable featureGate - https://github.com/togethercomputer/infra/pull/4044

Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant