-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to inject host site versions of libfabric/UCX #252
Comments
Issue with making those libraries available is that we don't control the elf header of (an injected) The only thing I can think of right now is the use LD_LIBRARY_PATH to the same location as the overrides and have a copy of the necessary library/libraries there. This sounds like another good reason to have the init scripts be a symlink. |
One solution may be to use the the Gentoo Prefix equivalent of |
Another option would be to ask AWS to build libfabric with RUNPATH support for |
In the older EESSI versions we saw some performance issues from GROMACS when injecting
So, by injecting the libraries we get about a 5% performance improvement. |
LD_PRELOAD is clumsy and I would prefer to figure out another way to do the injection. I wonder if I can use |
with those changes I was able to run
This means if the modified Due to our EasyBuild hook for rpath, we don't need LD_PRELOAD to be able to force the exectuable to find the library:
and we we can then see that EESSI resolves libmpi to the injected library with |
remove obsolete test files, correct cpupath for skylake
We were looking into the case of the EFA fabric at AWS. What we provide in EESSI works with the fabric, but it is true that you get better performance with the libfabric version that they ship with the OS (Amazon Linux 2 in the case we investigated).
You can check this with:
and compare that to
As things stand, we've only built in capabilities to switch out the MPI library, but it may be better/easier to switch out the UCX/libfabric libraries.
The text was updated successfully, but these errors were encountered: