Nvidia CUDA¶

With the switch from Ubuntu to Debian on our managed Linux workstations, we also re-evaluated how to install and manage Nvidia GPU drivers and CUDA. We opt for keeping the system frequently updated on the latest upstream version and a central installation with a given set of older CUDA library versions, instead of varying setups from one host to the other.

This page is primarily intended for developers who would like to compile CUDA software from source or if you need to run software linked against old major CUDA versions.

Python CUDA¶

To use CUDA capable Python packages such as PyTorch or TensorFlow please refer to our separate guide for Python on Managed Linux Workstations. We provide a set of ready to use packages in a central installation compatible with our Nvidia and CUDA drivers.

The information on this page is not relevant in that case.

Nvidia drivers¶

All Debian workstations will have the same production branch or LTS version driver installed. We will update the driver periodically (every 3 months) or when required to support newer CUDA versions, which will be carried out simultaneously across the workstation fleet.

A driver update always requires a reboot, which will be announced in the Matrix room #linux:phys.ethz.ch some days in advance. Note that security updates may require an immediate reboot without time for announcements in advance.

Refer to the data center driver life-cycle for details.

Legacy GPUs¶

Support for older GPUs (about 10 years, requiring 470 or 390 legacy drivers) was dropped by Nvidia for Debian 12. If your Nvidia GPU does not support the latest driver version, we will install the Debian packaged 470 Tesla driver for supported GPUs and for everything older we will install the open-source nouveau driver, which may lead to degraded performance or instability. To avoid that a GPU upgrade is required.

Please refer to the list of supported GPUs on the Nvidia unix drivers page and navigate to: Latest Production Branch Version: 5xx.xx.xx > Supported Products.

We are also happy to help you if are unsure what to do, just contact us.

System CUDA version¶

All Debian workstations have the same (latest possible) CUDA version from the upstream Nvidia repo installed.

You can check the currently installed system CUDA version using ls -la /etc/alternatives/cuda or nvidia-smi.

System CUDA Environment¶

You have to setup your environment to use the CUDA Toolkit:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

System CUDA Updates¶

We will periodically update to a new major (X.) or minor (X.Y) version once it is stable for production use and at least a compatible production branch driver is available. Release/patch CUDA version (X.Y.Z) will be updated automatically once per day. The installation always includes a compatible cuDNN library version.

Older CUDA versions¶

We manage a central installation with a selection of older CUDA library versions in our /opt/software/ NFS mount available on all managed Linux workstations.

In order to use it, source the corresponding environment variables depending on your required CUDA version (X.Y.Z):

source /opt/software/cuda/env/X[.Y[.Z]]

This will setup your environment variables to point to the specific CUDA version libraries and tools. We also made sure to include the latest compatible cuDNN version in the set of libraries.

To check the currently loaded CUDA version you can use:

nvcc --version

After that you may compile your CUDA software from source.

cuDNN¶

You may manually load other cuDNN by including a specific /opt/software/cuda/cudnn* version in your environment.

CUDA version compatibility¶

There is currently only a selection of a few CUDA library versions inside /opt/software/cuda/, which should be enough for most use-cases due to CUDA backwards compatibility.

X.Y CUDA version are binary backwards compatible but source compatibility might be broken. We encourage you to keep your sources up to date with recent CUDA changes, but we are also ready to install other CUDA versions on request. Please contact us in that case.

Please refer to the docs for detailed information:

IT Services Group

HPT H 6 – H 9

Contact

ETH Zurich

Physics Department

IT Services Group

HPT H 6 – H 9

Auguste-Piccard-Hof 1

8093 Zürich

Switzerland