-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update known issue about lmod hook in host-injection #183
Conversation
docs/known_issues/eessi-2023.06.md
Outdated
@@ -26,3 +26,17 @@ export OMPI_MCA_pml='ucx' | |||
export OMPI_MCA_mtl='^ofi' | |||
``` | |||
</div> | |||
|
|||
### `Bug in EESSI initialization and priority mechanisms: site OpenMPI or UCX not loaded` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are two seperate issues, right? I mean #456 was about site-specific tuning in general, but it only became an 'issue' when people hit the Failed-to-modify error. Shouldn't we just add to the "Failed to modify UD QP to INIT... etc" that the OMPI_MCA_* environment variables can be done in host_injections/...
?
I was browsing the docs, I think we should have some general explaination of host_injections
under Advanced usage
. I.e. that this directory is used for site specific tuning. And then, in a paragraph there, we can add how it can be used to execute LMOD hooks for site specific tuning. We can give an example there of a hook (might as well use the example of setting these three OpenMPI environment variables). It's a good point, I wanted to make this documentation, but didn't get to it before my leave. Writing it as a todo for this afternoon... We can then refer to that from the "Failed to modify UD QP to INIT" known issue and give the code for the LMOD hook that can be used as a workaround for this particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs PR for Lmod hooks made here. I'd simply put an example hook to resolve this issue under the existing Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
header, and reference the new docs on Lmod hooks for further information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casparvl Adjusted on the known issues pages. Could you review it and if everything is okay, merge it?
@xinan1911 could you also look at |
Adjusted! Those pages are removed in the mkdocs |
@Xin the CI is failing because you behind on main |
Co-authored-by: Caspar van Leeuwen <[email protected]>
@laraPPr
Caspar has already fixed these in his PR https://github.com/EESSI/docs/pull/188/files#diff-679f410211edd9a310548400e25680c354062d32ab8a7df0c963d42ac85d2da8 which hasn't been merged yet. Maybe his PR can be merged first before this PR. |
I was referring to these warnings
|
Branch is behind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
No description provided.