Torque is a resource manager and queueing system. It's a fork of the no longer actively maintained OpenPBS. The name stands for Terascale Open-Source Resource and QUEue Manager.
Any executable script can be submitted to Torque. The code does not need to be linked to any specific libraries. The configuration of a job is done through PBS directives at the beginning of the script. Consider the following example
#!/bin/bash #PBS -M email@example.com #PBS -N myjobname #PBS -l walltime=02:30:00 #PBS -l nodes=1:ppn=24 #PBS -l mem=2gb #PBS -l vmem=8gb cd $PBS_O_WORKDIR echo "Sleeping" sleep 10 exit 0
The job named myjobname is supposed to run for less than two and a half hours on 24 cores. It requires a minimum of 2GB and a maximum of 8GB of RAM. All it does is to print a text to standard output and sleep for 10 seconds before successfully exiting.
cd $PBS_O_WORKDIR makes sure that the job output is saved in the directory from which the job was submitted, instead of the home directory of the user.
The queue delays the running of the job until all required ressources are available. In the above case, the queue waits to have 24 available cores and at least 2GB of memory.
If the running job does not meet the announced requirements, it can be killed by the queue. This happens for instance if the job is still running after two and a half hours, or if its memory needs exceed 8GB. This is to make sure that jobs with potential memory-leaks get killed before they block other jobs or even worse the whole system.
qsub testjob.sh # submit job to the queue qstat -a # list jobs in the queue qdel <jobID> # remove job numbered <jobID> from the queue
(the torque binaries can be found in
The jobs in the queue are flagged
R when they are running,
Q while queued, and
C during 15 minutes after they are completed. After that time they disappear from the queue.
If you get an error message related to a wrong ssh passphrase, try submitting your job as follows:
qsub -v SSH_AUTH_SOCK testjob.sh
When submitting MPI jobs you may add an option that the termination signals received by
mpirun will be transmitted to the computing processes. This ensures that all processes are properly killed when the job is removed from the queue using
mpirun -mca orte_forward_job_control 1 -np 2 a.out
After completion of the job, two files are created for it:
myjobname.o<jobID>: contains the standard output of your job
myjobname.e<jobID>: contains the error output of your job
By default these files are stored at the root of your homefolder. If you include the
cd $PBS_O_WORKDIR command in your script, they are saved in the directory from which the job was submitted.