srun¶
The most simple (and unadvised) method to run jobs is using srun
program
. This will create a job allocation and spawn the application (program
) within, i.e. Slurm will find a suitable node in the cluster and run program
on that node. srun
runs interactive and will wait for job completion. Despite from testing there is no use in running srun
directly - instead call srun
with in your Slurm batch script and start it with sbatch
.
srun
has a lot of options, the most useful ones are explained below. Also see man srun for complete list of options and more examples.
The executable _`program`_ has to to be available on the
nodes it is supposed to be executed on. Slurm will not
distribute the executable across the nodes by itself.
The easiest approach is to use shared storage (i.e. `$HOME`),
however this will have a serious performance impact. See the
broadcast tool `sbcast` to facilitate the distribution of
data across nodes.
Beside small test jobs, there is no use case to run srun
directly. It is meant to be run inside an existing job
allocation created by `sbatch` or `salloc`. An invocation of
`srun` inside an job allocation is called a _job step_.
Examples¶
Basic Example¶
Run 4 identical tasks (-n4
) consisting of /usr/bin/hostname
on two (-N2
) different nodes of the partition test
(-ptest
) and label the output with task numbers (-l
):
srun -l -n4 -N2 -ptest /usr/bin/hostname
2: guenther62
3: guenther63
1: guenther62
0: guenther62
guenther62
and the task 1 is run on guenther63
. The exact task distribution is up to Slurm. Also the tasks may finish in a different order.
You could pass an executable with parameters to `srun` but nothing too fancy, like ad hoc scripts. It is necessary to wrap anything more complicated into a single executable.
```sh
srun -ptest echo $HOSTNAME
guenther1
```
`$HOSTNAME` is expanded by `bash` on the control node, afterwards `echo guenther1` is scheduled for execution on some node. This probably not what you want.
Starting multiple jobs¶
It is possible to start multiple jobs using a single srun
command. Every job has its own set of parameters.
srun -ptest -n10 sleep 5 : -ptest -n2 hostname
srun: job 12024 queued and waiting for resources
srun: job 12024 has been allocated resources
guenther63
guenther63
^Csrun: interrupt (one more within 1 sec to abort)
srun: StepId=12024.0 tasks 0-9: running
srun: StepId=12025.0 tasks 0-1: exited
While srun
is running Ctrl-c
will report the state of all tasks. Two consecutive Ctrl-c
within 1 second will terminate all tasks. Three Ctrl-c
will do so forcefully.
Use wrapper scripts¶
Instead of specifying everything on the command line, it is advisable to wrap everything in small shell script. This makes it easier to keep track of all command line options and allows the use of sbatch
, see below.
example.sh
shell script:
#!/bin/bash
srun -ptest python -c 'import socket; print(f"I am {socket.gethostname()}")' &
srun -ptest python -c 'import socket; print(f"I am {socket.gethostname()}")' &
wait
Run the script: bash example.sh
.
`srun` is put into the background using `&`. This way the `srun` commands are started in parallel. Without the ampersand `bash` would wait for the fist command to finish before starting the second. Slurm make sure to find suitable resource allocation to run things in parallel. The `wait` in the end advises bash to wait for all background tasks to complete before continuing.
Options¶
Option | Effect |
---|---|
-J name , --job-name= name |
Specify a job name, see squeue |
-N N , --nodes= N |
Allocate (at least) N for this job |
-n N , --ntasks= N |
Number of tasks to run |
-l , --label |
Prepend task number before each line of output |
--mail-type= ALL |
Inform per mail about job events |
--mail-user= user |
Send mails to user |
--mem= size[units] |
Required (real) memory per node |
--mem-per-cpu= size[units] |
Required (real) memory per CPU |
--ntasks-per-core N |
Maximum number of tasks per core |
--ntasks-per-node N |
Maximum number of tasks per node |
--ntasks-per-socket N |
Maximum number of tasks per socket |
-c N , --cpus-per-task= N |
Allocate N CPUs per task |
--cpus-per-gpu= N |
Allocate N GPUs per task |