Install [Nvidia Driver, CUDA Toolkit, cuDNN] and [PyTorch, TensorFlow, JAX] with pip on Ubuntu 20.04, and train a demo CNN
This tutorial
- starts with a fresh Ubuntu 20.04 install,
- goes through the installation of the Nvidia Driver (520),
- CUDA Toolkit, which includes nvcc (11.8),
- cuDNN library (8.5.0),
- python3.9 and python3.9-venv,
- and PyTorch (1.13.0+cu117), TensorFlow (2.10.0), and JAX (cuda11_cudnn82).
- And ends with training a CIFAR 10 classifier (PyTorch, TF) and MNIST (JAX).
The versions of TyPorch, TensorFlow, and JAX are the latest as of 12th November 2022 and you should replace them by the latest at installation time.
Note: Pytorch does not need CUDA Toolkit or cuDNN if it is installed via Conda or Pip. See the PyTorch forum here, here and this issue (The forum links seem to work in Chrome but not in Firefox!).
Here, we start with newly installed Ubuntu 20.04, which comes with Python3.8. The installation procedure is described here and we assume that you selected "Install third-party software for graphics [...]" at this step of the installation. This additional software includes the Nvidia Driver.
If you didn't install the third-party software (including Nvidia Driver) during Ubuntu install, you could follow one of the many online tutorials like this one, or install the driver during the CUDA Toolkit installation described below. Best case, the driver is the latest possible from Software & Updates -> Additional Drivers
, and is proprietary and tested, e.g. Using NVIDIA driver metapackage from nvidia-driver-520 (proprietry)
. You know that you are done when
nvidia-smi
returns something like
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 43C P8 7W / N/A | 329MiB / 8192MiB | 12% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1759 G /usr/lib/xorg/Xorg 20MiB |
| 0 N/A N/A 2688 G /usr/lib/xorg/Xorg 79MiB |
| 0 N/A N/A 4204 C+G ...156811687088679943,131072 227MiB |
+-----------------------------------------------------------------------------+
Important: the CUDA version here (top right) should be as high as possible and at least the same as the CUDA Toolkit version.
2. CUDA Toolkit
The toolkit contains (almost) all we need for ML applications, including the nvcc
compiler driver. We need to select the proper Toolkit version. Here, we demonstrate how to set up the latest CUDA 11.8 from the Nvidia archive. After specifying the system, choose the runfile (local)
installation. In our case:
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
When prompted by the CUDA Installer, if you don't already have Nvidia Driver, you should install it now. From the other suggested options only CUDA Toolkit 11.8 is necessary.
We need to tell the command line where to find the toolkit installation. This is done by adding two lines to the ~/.bashrc
file. In our case:
export PATH=$PATH:/usr/local/cuda-11.8/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64
You know that the installation is done when you open a new terminal and
nvcc -V
returns something like
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
3. cuDNN
This library efficiently implements some deep learning primitives like CNN layer. Which cuDNN version to select? As long as the CUDA version is the same as of the Toolkit version, the higher cuDNN version the better. Then, download cuDNN Library for Linux (x86_64) which should be a .tgz
or .tar.xz
file.
Once downloaded, unzip the contents and copy them to the corresponding include
and lib64
CUDA install folders in /usr/local/cuda-11.8
(see steps inlcuding chmod
here). In our case:
tar xvf cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive
sudo cp include/cudnn*.h /usr/local/cuda-11.8/include
sudo cp -P lib/libcudnn* /usr/local/cuda-11.8/lib64
sudo chmod a+r /usr/local/cuda-11.8/include/cudnn*.h /usr/local/cuda-11.8/lib64/libcudnn*
Here, we
- demonstrate how to install an arbitrary python 3.x version
sudo apt install python3.9
- and how to create virtual environmens using that version.
sudo apt install python3.9-venv
python3.9 -m venv ./venv # create empty virtual env
source venv/bin/activate # begin working inside the environment
pip install --upgrade pip
In the following, we assume that you have created and activated a new environment.
Here, you see how to install the latest versions of the three libraries as of 12th November 2022.
- PyTorch installation. The only requirement is the Nvidia Driver (no CUDA Toolkit or CuDNN required as they come with PyTorch), whose version has to be >= the CUDA version below.
pip3 install torch==1.13.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
- TODO: TensorFlow installation. Depends on CUDA Toolkit and cuDNN.
pip install tensorflow==2.10.0
Also, for TF we need to add the following line to ~/.bashrc
:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/extras/CUPTI/lib64
- JAX installation. JAX requires CUDA Toolkit and CuDNN, but doesn't care much about the versions. The Toolkit has to support CUDA >= 11.1, and cuDNN has to be >=8.0.5. For deep learning we need additional libraries, which are in the second line here.
pip install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -U dm-haiku # additional dependencie for ML
We recommend installing Torch, TF, and JAX in an venv, next to which we clone the repo with the scripts:
git clone https://github.com/arturtoshev/test_torch_cuda.git
cd test_torch_cuda
Now, test the CUDA installation by running the training scripts as shown below. Each script contains training a neural network for 2 epochs using a GPU and 2 epochs with CPU. PyTorch code adapted from this tutorial, TensorFlow code from here, and JAX code from here. Total runtime should take around one minute. Note that the datasets will be downloaded in .data/
(PyTorch and JAX) and ~/.keras/datasets/
(TensorFlow) and can be deleted afterwards.
python main_torch.py
python main_tf.py
python main_jax.py
Everything worked fine if the runs finished in less than two minutes and didn't give any errors. Also, GPU runs have to be at least 2x faster.