Linux Workstations Debian

In 2024 Q3 we will migrate the OS of our managed Linux workstations from Ubuntu to Debian.

Why Debian

For the last 10+ years our workstations were using Ubuntu while we always used Debian for our servers. In the past Ubuntu brought additional features and software for workstations which were missing in Debian. This is no longer the case and we believe Debian is very capable to provide the same or better workstation experience. It also costs us a lot of engineering effort to maintain two distributions and in the last couple of years Ubuntu introduced changes which make it difficult for us to continue supporting it. This led us to the decision to fully switch to Debian and say bye to Ubuntu👋.

During the last 12+ months we worked hard on the engineering and testing of our new Debian workstation setup. It consists of over 500 commits to our Ansible repo and brings many changes and improvements. Some of the changes to be aware of are listed below. Also we are very proud that from now on every workstation will come 100% reproducible out of our automated Ansible deployment and after the migration we will be able to fully reinstall a workstation within 10-20 minutes, depending on the hardware, keeping the downtime as low as possible for our users.

Required steps before the migration

In close collaboration with the owner and/or user of the respective workstation, the following steps are required before the migration:

Possibly required hardware upgrades

Some old or underdimensioned hardware might not be compatible with the new OS/setup. You may want to replace the affected hardware. Please order replacements in time for the migration. This affects the following hardware components:

  • Legacy Nvidia cards requiring 470 or 390 legacy drivers, see changes: Nvidia
  • System disks <256GiB will result in no /scratch space (1GiB only), see changes: System disk, Scratch

Preparation steps for users

These steps should be done before the migration:

  • Scratch: please clean up or inform us to wipe /scratch, see changes: Scratch
  • Cronjobs: you may want to prepare the switch to systemd timers in advance, see changes: Cronjobs

Changes

The migration includes the following notable (breaking) changes to be aware of:

  • Matrix room: please join the #linux:phys.ethz.ch matrix room, which is used for future announcements and support. Important announcements will be sent with @room ping, so for best experience set the room notification level to Mentions & keywords to reduce noise.
  • Ansible: 100% reproducible automated installation and setup
  • Installer: custom debian installer component which allows fast reinstallation (~15min downtime with SSD). This and our new LVM setup allows us to keep the local data (/scratch*) and just wipe the OS parts in the future. Due to size and filesystem changes the whole system disk needs to be wiped during the initial migration to debian
  • System disk: we require a minimum 256GiB (SSD) as the system disk for new workstations
  • System disk: 256GiB of the system disk space is reserved for ISG use (OS, software and reserved for future use)
  • Scratch: /scratch is a logical volume on the system disk and will be allocated the remaining unreserved space. Due to the changes above it is possible that after the migration there will be no more /scratch space (just 1GiB). To avoid that, please organize a bigger SSD as system disk (>256GiB).
  • Scratch: During the migration we will have to backup/restore /scratch. Please clean it up in advance to speed up the migration. Estimated restore speed is ~0.5-10h/TiB depending on the number of files and file sizes. Let us know if we can wipe it clean without backup/restore, which would be ideal.
  • Reboots: we reserve the right to reboot machines at anytime without announcement due to security updates. We also plan periodic reboots at least every 3 months due to driver updates (mainly for nvidia drivers). Announcements of such reboots will only happen in the #linux:phys.ethz.ch matrix room.
  • Python: A new major Python version typically requires users to re-create all virtual environments. In addition we opted for a single Python installation shared across all workstations, with a variety of packages pre-installed. Note that you must activate the Python in your shell in order to use it, so please have a look at our detailed Python readme.
  • Firefox: Due to the switch from Ubuntu to Debian all Firefox profiles need to be migrated once. This is a manual process and ideally happens during the workstation reinstallation. We will assist in the process. Read firefox profile for instructions on how to migrate your profile in order to preserve your history and settings if you'd like to give it a go yourself. By default only Firefox ESR (Extended Support Release) is installed now. Previously (with Ubuntu) there were two versions of Firefox installed (ESR and Rapid Release).
  • NFS: Local filesystems (such as /scratch) will no longer be exported via NFS by default, due to the lack of access restrictions for NFS exports (all D-PHYS users may write to it from anywhere). This is only available on request by the hardware/storage-device owner, if they agree with the limitations.
  • Nvidia: the latest version 5xx (550+) driver is the only official installable driver for Debian 12 bookworm. Support for older GPUs (about 10 years, requiring 470 or 390 legacy drivers) was dropped by Nvidia for Debian 12. If your nvidia GPU does not support the latest driver version, we will install the Debian packaged 470 Tesla driver for supported GPUs and for everything older we will install the opensource nouveau driver, which may lead to degraded performance or instability. To avoid that, a GPU upgrade is required. Refer to the list of supported GPUs on the nvidia unix drivers page and navigate to: Latest Production Branch Version: 5xx.xx.xx > Supported Products. We are also happy to help you if are unsure what to do, just contact us. Refer to Nvidia drivers for details.
  • Cuda: the default cuda version installed on the system will always be the latest minor stable version that is compatible with our currently deployed production branch driver. This currently is cuda version 12.x. Refer to System CUDA version for details.
  • Cuda: to use an older cuda and/or cudnn version, use our central installation in /opt/software/cuda. Refer to Older CUDA versions for details.
  • Rust: We provide a central installation of the stable upstream rustc compiler and cargo package manager. We automatically activate it via environment variables for you. Have a look at our detailed Rust readme.
  • Software: previously locally installed in /opt is now provided via an NFS mount at /opt/software. Your user environment variables (PATH and others) will be pre-set during logon so that you can start software as usual from your terminal or graphical session (for instance just type matlab).
  • Resource control: to ensure system stability and fair distribution of compute resources (CPU, memory and IO) we use kernel resource control features (cgroup v2) and a userspace OOM killer (systemd-oomd) which uses pressure stall information (PSI) metrics to monitor user cgroups and take corrective action before an OOM occurs in the kernel space. Refer to resource control for details.
  • Cronjobs: In the past some users had permissions to configure cronjobs. This is not possible anymore. Existing cronjobs will not be migrated to the new setup. Please migrate your jobs to systemd timers. Now all users may configure recurring jobs using systemd user timers and services. Refer to recurring jobs for details.
  • Cache: Cache files will now be stored in /scratch/.cache/${USER} if a program adheres to the XDG specification. You may safely remove your old cache in your home directory (rm -r ~/.cache) after the migration.
  • Privileges: Some actions on Linux, such as reboot or managing external storage (USB formatting) require special elevated privileges. Refer to user privileges for details.
  • Xrdp: For security reasons xrdp will only be installed on-demand if requested by the hardware owner and the network service will only listen on the loopback interface, therefore requiring an SSH tunnel. Refer to ssh tunnel for details.