Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider running "unset MODULEPATH" as part of EESSI initialization procedure #775

Open
boegel opened this issue Oct 4, 2024 · 6 comments
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io

Comments

@boegel
Copy link
Contributor

boegel commented Oct 4, 2024

On some systems, for example the Vienna Scientific Cluster (VSC) that was used for the EESSI introductory webinar on 4 Oct 2024, keeping the existing $MODULEPATH causes problems after initializing up the EESSI environment.

We should consider running unset MODULEPATH as a part of our initialization procedure, but that's a pretty aggressive move, and it may cause problems on some system, in particular those where one or more modules are loaded by default at login, since unsetting $MODULEPATH would result in unloading those modules.

That's the case on our systems at UGent:

$ echo $MODULEPATH
/apps/gent/RHEL8/zen2-ib/modules/all:/etc/modulefiles/vsc

$ module list

Currently Loaded Modules:
  1) env/vsc/doduo (S)   2) env/slurm/doduo (S)   3) env/software/doduo (S)   4) cluster/doduo (S)

  Where:
   S:  Module is Sticky, requires --force to unload or purge

$ module av cluster/ env/

----------------------------------------------------------------------------------------------------------------------------- /etc/modulefiles/vsc -----------------------------------------------------------------------------------------------------------------------------
   cluster/accelgor (S)      cluster/gallade (S)      env/slurm/accelgor (S)      env/slurm/gallade (S)      env/software/accelgor (S)      env/software/gallade (S)      env/vsc/accelgor (S)      env/vsc/gallade (S)
   cluster/default           cluster/joltik  (S)      env/slurm/default           env/slurm/joltik  (S)      env/software/default           env/software/joltik  (S)      env/vsc/default           env/vsc/joltik  (S)
   cluster/doduo    (S,L)    cluster/shinx   (S)      env/slurm/doduo    (S,L)    env/slurm/shinx   (S)      env/software/doduo    (S,L)    env/software/shinx   (S)      env/vsc/doduo    (S,L)    env/vsc/shinx   (S)
   cluster/donphan  (S)      cluster/skitty  (S,D)    env/slurm/donphan  (S)      env/slurm/skitty  (S,D)    env/software/donphan  (S)      env/software/skitty  (S,D)    env/vsc/donphan  (S)      env/vsc/skitty  (S,D)

  Where:
   S:  Module is Sticky, requires --force to unload or purge
   L:  Module is loaded
   D:  Default Module
@boegel boegel added the 2023.06-software.eessi.io 2023.06 version of software.eessi.io label Oct 4, 2024
@casparvl
Copy link
Collaborator

casparvl commented Oct 7, 2024

I'd vote for not doing this by default, but make it configurable through an environment variable. Something like

export EESSI_UNSET_MODULEPATH_ON_INIT=1

That way, you can opt to only use this on systems where it is really needed. That's probably the minority, plus it means we don't change current (default) behavior. Also, as you said, cleaning the modulepath is quite aggressive, I wouldn't like that as a user if this happened by default - even if one could argue that it might not be a great idea to mix local & EESSI modules (except for EESSI-extend like stuff).

@ocaisa
Copy link
Member

ocaisa commented Oct 7, 2024

I also don't think it should be necessary to clean out the MODULEPATH, but to avoid bad user experiences that we have already seen I think it should be done by default. We can provide an environment variable like EESSI_RETAIN_MODULEPATH_ON_INIT for the sites that don't want that behaviour.

The general approach though would be to work with sites supporting EESSI to figure out best practice. My suggestion with respect to that would be:

  • The site stack MODULEPATH gets set via a module file which has the Lmod family software-stack. This module file can be loaded by default for the system (the idea being they retain their existing behaviour, but now with a minimal hierarchy). I implemented this kind of behaviour at JSC for many years (where we had stages for software stacks, with the default user view moving with the times).
  • In the same file system location as this set of module files, the site makes a symlink to /cvmfs/software.eessi.io/init/modules/EESSI so the EESSI stack modules are also accessible. The EESSI modules also set the Lmod family software-stack so you cannot have both site and EESSI stacks loaded at once.

A negative side of this is that we forcing people in the direction of adopting Lmod, but of course people can always contribute to help us figure out a way around even that.

@ocaisa
Copy link
Member

ocaisa commented Oct 7, 2024

Actually, for Lmod MODULEPATH is just another environment variable, so you can overwrite it with pushenv so that it is returned to it's previous value once you unload EESSI. The hard question there is do you do a purge() or not before pushenv?

@boegel
Copy link
Contributor Author

boegel commented Oct 7, 2024

Actually, for Lmod MODULEPATH is just another environment variable, so you can overwrite it with pushenv so that it is returned to it's previous value once you unload EESSI. The hard question there is do you do a purge() or not before setting it?

I wouldn't purge, since then at least Lmod will print a warning that modules that were loaded are no longer active after doing a hard set of $MODULEPATH with only the EESSI paths.

In our case, it would have to be a module --force purge anyway, since our cluster and env modules are sticky, and that would be a pretty big hammer.

I also don't think it should be necessary to clean out the MODULEPATH, but to avoid bad user experiences that we have already seen I think it should be done by default. We can provide an environment variable like EESSI_RETAIN_MODULEPATH_ON_INIT for the sites that don't want that behaviour.

I guess you could argue both ways w.r.t. hard setting $MODULEPATH or not.
I'm not sure what the most common case would be.

I guess we should aim for what is most likely to produce the best user experience (i.e. a working EESSI environment).

For the HPC-UGent systems, it would be a problem: the /etc/modulefiles/vsc should always remain in $MODULEPATH, but that doesn't provide any software modules. If not, communication with Slurm wouldn't work anymore, and a bunch of important environment variables like $VSC_SCRATCH (which points to your personal scratch directory on that system) would be unset.
The /apps/gent/.../modules/all path(s) can be removed, as long as no software modules were loaded already.

So for this particular use case, the proposed $EESSI_RETAIN_MODULEPATH_ON_INIT would have to support specifying which paths should be kept, like:

export EESSI_RETAIN_MODULEPATH_ON_INIT=/etc/modulefiles/vsc

When the EESSI init script or module hard sets $MODULEPATH, it could print a warning that mentions $EESSI_RETAIN_MODULEPATH_ON_INIT and provides a pointer to documentation with more info?
That warning would basically always be printed on HPC systems though, I think it's rare that $MODULEPATH would not be set...

There may be a feature request here for Lmod: if the path to loaded sticky modules disappears from $MODULEPATH, maybe it should just restore those locations in $MODULEPATH, unless you tell it not to. You could argue that modules aren't really sticky if it doesn't do that.
This would help us, but that's clearly only going to help us in the long term...

@ocaisa
Copy link
Member

ocaisa commented Oct 7, 2024

I can see

export EESSI_RETAIN_MODULEPATH_ON_INIT=/etc/modulefiles/vsc:/some/other/path

working and being quite flexible. It's also an active setting, so as an admin you are going to read up on the consequences.

Wherever the EESSI module itself is would also need to remain in view (but you can figure that out through introspection functions, hopefully this will also work in the case of symlinks). So at UGent, if you symlinked /cvmfs/software.eessi.io/init/modules/EESSI under /etc/modulefiles/vsc you probably wouldn't even need the setting.

@mboisson
Copy link

mboisson commented Oct 8, 2024

What about where Lmod itself is coming from ? I am assuming EESSI provides its own Lmod ? The version of Lmod can be something that needs to be handled. We have many modules that have features supported only since specific versions of Lmod, and which are shielded by if/else on the Lmod version.

Also, assuming a site uses Lmod (regardless of EESSI), Lmod has a "priority" feature which may be useful here. I would say MODULEPATH should not be messed with directly, but rather through Lmod, with the priority feature. We have things like this in our init scripts (https://github.com/ComputeCanada/software-stack-config/blob/main/profile.d/z-20-lmod.sh#L33)

if [[ -d /opt/software/modulefiles ]]; then
	module -q use --priority 10 /opt/software/modulefiles
fi
if [[ -d $HOME/modulefiles ]]; then
	module -q use --priority 100 $HOME/modulefiles
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io
Projects
None yet
Development

No branches or pull requests

4 participants