Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EESSI CUDA hook prevents loading even local (non-EESSI) CUDA module #523

Closed
casparvl opened this issue Mar 29, 2024 · 0 comments · Fixed by #530
Closed

EESSI CUDA hook prevents loading even local (non-EESSI) CUDA module #523

casparvl opened this issue Mar 29, 2024 · 0 comments · Fixed by #530
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bug Something isn't working

Comments

@casparvl
Copy link
Collaborator

casparvl commented Mar 29, 2024

{EESSI 2023.06} (eessi_test_venv) [casparl@tcn1 PyTorch]$ module load torchvision/0.13.1-foss-2022a-CUDA-11.7.0
Lmod has detected the following error:
You requested to load CUDA  but while the module file exists, the actual software is not entirely shipped with EESSI due to licencing. You will need to install a full copy of the CUDA SDK where EESSI can find it.
For more information on how to do this, see https://www.eessi.io/docs/gpu/.

While processing the following module(s):
    Module fullname                            Module Filename
    ---------------                            ---------------
    CUDA/11.7.0                                /sw/arch/RHEL8/EB_production/2022/modulefiles/system/CUDA/11.7.0.lua
    torchvision/0.13.1-foss-2022a-CUDA-11.7.0  /sw/arch/RHEL8/EB_production/2022/modulefiles/vis/torchvision/0.13.1-foss-2022a-CUDA-11.7.0.lua

_mlstatus = False

There is no reason to prevent this load, since this is a local module. Especially if we want to support building on top of EESSI, this should just work.

It's (probably) quite easy to fix: we should make the if condition here more specific and make sure it also checks if it is an EESSI CUDA module. A check whether it is somewhere in the /cvmfs/$EESSI_CVMFS_REPO/ prefix is probably enough.

@casparvl casparvl added bug Something isn't working accel:nvidia labels Mar 29, 2024
@boegel boegel added the 2023.06-software.eessi.io 2023.06 version of software.eessi.io label Apr 2, 2024
casparvl pushed a commit to casparvl/software-layer that referenced this issue Apr 3, 2024
…'t want to be prevented from loading local CUDA modules because of the EESSI hook. See EESSI#523
@trz42 trz42 closed this as completed in #530 May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants