The hub and its HTTP proxy are run by a non-root user in a rootless container. The container is managed in the host systems by a service in systemd with podman.
Notebooks are launched remotely, on the compute nodes of our HPC cluster. The allocation of hardware resources for the notebook is done on-demand by the resource manager Slurm. Users can select the resources for their notebooks from the JupyterHub interface thanks to the JupyterHub MOdular Slurm Spawner, which leverages batchspawner to submit jobs to Slurm in user's behalf to launch the single-user server.
The main particularity of our setup is that such jobs are not submitted to Slurm from the host running JupyterHub, but from the login nodes of the HPC cluster via an SSH connection. This approach has the advantage that the system running JupyterHub can be very minimal, avoiding the need for local users, special file-system mounts and the complexity of provisioning a Slurm installation capable of submitting jobs to the HPC cluster.
JupyterHub is run by a non-root user in a rootless container. Setting up a rootless container is well described in the podman rootless tutorial.
We use a custom system service
to start the container with podman
by the non-root user jupyterhub
(aka
JupyterHub operator).
This service regenerates at (re)start any running container with a new one from
the container image. This approach ensures a clean state of the container and
allows to easily recover from any runtime issues with it.
The root filesystem in the container is read-only. The only writable space is the home directory of the non-root user running the container. We also bind a few read-only mounts with sensitive configuration files for JupyterHub, SSL certificates for the web server and SSH keys to connect to the login nodes. Provisioning these files in the container through bind-mounts allows to have secret-free container images and seamlessly deploy updates to the configuration of the hub.
The network setup for JupyterHub is rather simple. Rootless containers do not have a routable IP address, so they rely on the network interfaces of the host system. The hub must be able to talk to the notebooks being executed on the compute nodes in the internal network, as well as serve the HTTPS requests (through its proxy) from users on the external network. Therefore, ports 8000 (HTTP proxy) and 8081 (REST API) in the container are forwarded to the host system.
The firewall on the host systems blocks all connection through the external network interface and forwards port 8000 on the internal interface (HTTP proxy) to port 443 on the external one. This setup renders the web interface of the hub/notebooks accessible from both the internal and external networks. The REST API of the hub is only available on port 8081 of the internal network.
User authentication is handled through delegation via the OAuth service of the VSC accounts used by our users.
We use the GenericOAuthenticator from JupyterHub:
-
carry out a standard OAuth delegation with the VSC account page
-
URLs of the VSC OAuth are defined in the environment of the container
-
OAuth secrets are defined in JupyterHub's configuration file
-
-
local users beyond the non-root user running JupyterHub are not needed
Integration with Slurm is leveraged through a custom Spawner called
VSCSlurmSpawner based on
MOSlurmSpawner.
VSCSlurmSpawner
allows JupyterHub to generate the user's environment needed
to spawn its single-user server without any local users. All user settings are
taken from vsc-config
.
We modified the submission command
to execute sbatch
in the login nodes of the HPC cluster through SSH.
The login nodes already run Slurm and are the sole systems handling job
submission in our cluster. Delegating job submission to them avoids having to
install and configure Slurm in the container running JupyterHub. The hub
environment is passed over SSH with a strict control over the variables that
are sent and accepted
on both ends.
The SSH connection is established by the non-root user running JupyterHub (the
hub container does not have other local users). This jupyterhub user has
special sudo
permissions on the login nodes to submit jobs to Slurm as other
users. The specific group of users and list of commands allowed to the
jupyterhub user are defined in the sudoers file.
Single-user server spawn process:
-
user selects computational resources for the notebook in the web interface of the hub
-
VSCSlurmSpawner
generates environment for the user without any local users in the system of the hub -
jupyterhub user connects to login node with SSH, environment is passed through the wire
-
jupyterhub user submits new job to Slurm cluster as target user keeping the hub environment
-
single-user server job script fully resets the environment before any other step is taken to minimize tampering from user's own environment
-
single-user server is launched without the mediation of
srun
to be able to use software relying on MPI as srun-in-srun is not possible