Skip to content
Sriharsha Kandala edited this page Dec 1, 2023 · 10 revisions

clima.gps.caltech.edu is a GPU node with 8x NVIDIA A100 GPUs.

Getting access

Email [email protected] and request access

Setting up

Unlike central, clima has a handful of modules available. The recommended approach is to install in your home directory.

SSH config

Add to your local ~/.ssh/config file

Host clima
  HostName clima.gps.caltech.edu
  User [username]

To access from outside the network, either use the Caltech VPN

Match final host !ssh.caltech.edu,*.caltech.edu !exec "nc -z -G 1 login.hpc.caltech.edu 22"
  ProxyJump ssh.caltech.edu

About the machine

Storage

  • /home/[username] (capped at 1TB): mounted from sampo, and is backed up
  • /net/sampo/data1 (200TB): mounted from sampo. Not backed up, but somewhat protected by redundant RAID partition
  • /scratch (70TB): fast SSD, not backed up and no RAID redundancy

CPU usage

  • top

GPUs

clima has 8×NVIDIA 80GB A100 GPUs, connected via NVlink.

  • nvidia-smi gives a summary of all the GPUs
    • nvidia-smi topo -m shows the connections between GPUs and CPUs
  • nvtop gives you a live-refresh of current GPU usage

While GPUs can be used directly, it is always recommended to schedule jobs using SLURM. This prevents allocation of multiple jobs on the same GPU, which can cause significant performance degradation.

Software

It has a single-node installation of slurm.

We have set up a common environment. You can load this by

module load common

This will set the appropriate Julia preferences, so you should not need to e.g. call MPIPreferences.use_system_binary().

Clone this wiki locally