-
Notifications
You must be signed in to change notification settings - Fork 0
RCC Computing Guide
Here, we show the link needed for signing for an account, and then the appropriate answers to each question in the application.
- Obtain a new account. Visit this link: RCC Website Link
- Use the following responses to answer the application questions:
- Principal Investigator account name:
pi-nord
- software and system tools that you anticipate using for computational research at the RCC:
We will use scientific Python and deep learning codebases.
- A brief summary of your work that will use RCC resources:
We will perform research at the intersection of physics, cosmology, and artificial intelligence.
- Principal Investigator account name:
If you want multiple accounts, apply as above and then re-apply for the second time, including the first account on that application (there's a space to list existing account affiliations).
RCC access does not require the use of a VPN, but it can make remote notebook access (see here) easier.
- Download Cisco AnyConnect Secure Mobility Client here
- Log into the VPN using the address
vpn.uchicago.edu
in the Cisco AnyConnect Secure Mobility Client dialog box - Authenticate with your
Duo
multi-factor authentication application
- Log in with SSH at the command line,
ssh <cnetid>@midway2.rcc.uchicago.edu
- Authenticate your ID with the
Duo
multi-fac application. - Create an alias on your local machine to simplify your login:
- Open
~/.bash_profile
locally (on your home machine). - add this line:
alias sshrcc='> ~/.ssh/known_hosts; ssh midway2.rcc.uchicago.edu'
- Save and exit the file.
- Test at the command line:
sshrcc
- Open
- RCC often changes its IP address, which may cause errors on your local machine. This is why we recommend creating the alias.
-
<cnetid>
is your UChicago username. - if the username on your computer is the same as your
<cnetid>
, you can usessh midway2.rcc.uchicago.edu
instead.
-
Login
nodes are your landing node -- you always log in to alogin
node -
Compute
node can be accessed from thelogin
nodes -- use these for memory intensive computations
- Functioning:
- Default node. It is the most robust way, because it assigns you to the least-used node if they're both up, or it assigns you to the live one if one is down. This is our best interpretation of the situation. We haven't confirmed with RCC staff.
ssh midway2.rcc.uchicago.edu
- login1 (if you know you want to be on this particular node)
ssh midway2-login1.rcc.uchicago.edu
- login2 (if you know you want to be on this particular node
ssh midway2-login2.rcc.uchicago.edu
- Non-Functioning
-
midway2-login3.rcc.uchicago.edu
is not typically available -
midway.rcc.uchicago.edu
is decommissioned
-
- the most common (and recommended) way to access compute nodes is by running
sinteractive
(with optional flags as described on the RCC website here, and in this guide here) - if you want to run a job for longer than you wish to be actively logged in, you can use
tmux
- if you want to submit many jobs at once and don't want to have a different
tmux
session for everything, considerbatch
computing instead, as described here
-
functionality is described in the user guide here.
-
RCC recommends
sinteractive
for most use cases to select a node for compute. -
If
kicpaa
is your primary affiliation (rather thanpinord
) it might work without the flags. -
There is a separate partition for the KICP/A&Ap GPU allotment – to access that, do -- different partition as above, but the same account name
-
To use a CPU:
sinteractive -A kicpaa -p kicpaa
-
To use a GPU:
sinteractive -p kicpaa-gpu -A kicpaa
- Create a
.bash_profile
in your home directory on RCC and add any commands you want to run by default immediately upon logging in - I always load tmux in case I wind up running a job for longer than I want to remain actively logged in, so I include
module load tmux
- Activate a virtual environment (described more here):
source activate <env_name>
- RCC uses
conda
for package management - RCC provides many prebuilt conda modules.
- For guidance on creating virtual environments, see the second warning in this list of “mistakes to avoid”
- After creating a virtual environment named
env_name
, RCC preferssource activate <env_name>
as described here
- Load the latest Anaconda python (as of March 2023):
module load python/anaconda-2021.05
- Use
source activate
- Never use (has been known to break things like
ThinLinc
.):conda init
- Never use:
conda activate
There is an RCC guide for running Jupyter notebooks available here. Usage depends on whether or not you're on VPN.
The basic approach is as follows
- Write a script:
mybatchscript.sh
- Submit a script:
sbatch mybatchscript.sh <arguments>
The batch file can contain use different flags. For instance, to run on the kicpaa
partition with the kicpaa
account and passing two arguments to a script mypyscript.py
, the contents would look like
#!/bin/bash
#SBATCH --job-name=Hruns
#SBATCH --time=02:00:00
#SBATCH --account=kicpaa
#SBATCH --partition=kicpaa
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=12G
python mypyscript.py
and the two arguments get passed to mypyscript.py
via sys.argv
(for example). With these tools, you can iterate over the two arguments from 10 to 15 and 0 to 5, respectively, with for i in {0..5}; do for j in {10..15}; do sbatch <file name>.sh $j $i; done; done
This might be unnecessarily slow if SLURM doesn’t want to dispatch that many independent jobs. An alternative would be to use the array
flag of sbatch and instead mybatchscript.sh
looks like
#!/bin/bash
#SBATCH --job-name=Hruns
#SBATCH --time=02:00:00
#SBATCH --array=10-15
#SBATCH --account=kicpaa
#SBATCH --partition=kicpaa
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=12G
python mypyscript.py $SLURM_ARRAY_TASK_ID $1
The array
flag will ensure that mypyscript
is run with the first argument taking on values from 10-15 (inclusive) for any values of the second argument. To iterate over the second argument from 0-5 (inclusive), use the following at the CLI:
for i in {0..5}; do sbatch mybatchscript.sh $i; done
These two sets of scripts and commands will result in the same output, but they are handled by the scheduler differently (eg, the job names will have subscripts according to their $SLURM_ARRAY_TASK_ID
value, so the output files will be slurm_123456789_0.out
, slurm_123456789_1.out
, etc. rather than slurm_123456789.out
, slurm_123456790.out
, slurm_123456791.out
, etc.)
Compute nodes aren't connected to the internet, so if you want to clone a GitHub repository hosted at https://github.com/<myrepo>
do the following on the login node.
- Log in to a login node
- CLI:
git clone https://github.com/myrepo.git
- Provide your username and either your password (which you will need to reenter every time) or a token (as documented by git here)
- if you need compilers to install a development package, RCC has
gcc
, though it must be loaded withmodule load gcc
- DeepBench
- DeepGotData
- DeepUtils
- Google Colaboratory
- Elastic Analysis Facility (EAF; Fermilab)
- Research Computing Center (UChicago)
- coming soon.