Slurm

Slurm (historically: Simple Linux Utility for Resource Management) is a piece of software to orchestrate workloads on HPC clusters. The main tasks can be summarized as follows:

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm has three key functions: 1. Manage access and allocate resources (compute nodes) to users for some duration of time so they can perform work. 2. Starting, executing, and monitoring work (normally parallel jobs) on the set of allocated nodes. 3. Arbitrating contention for resources by managing work queues and scheduling jobs.

At the moment D-PHYS maintains a 64 node slurm cluster with restricted access rights.

external docs: slurm.schedmd.com/quickstart