How to handle long running jobs
If you want to run long running jobs on our Linux workstations you have to abide by a few rules.
- Do not bother anyone with your jobs. Keep the machine usable. Use nice or renice with at least nice level 15, better 18.
- Do not run RAM or CPU time intensive jobs at all on our terminal servers phd-ltsp1.ethz.ch or phd-ltsp2.ethz.ch as there are usually dozens of users logged in and try to do their daily work on those machines. Clogging up those machines causes many users not being able to work anymore, so no long running or heavily resource comsuming computing jobs are allowed on these two machines. Please use plumpy.ethz.ch instead, e.g. via SSH with X forwarding.
- If you see other long running jobs (more than 1 h CPU time or nice level > 0) on the machine, contact the job owner before you run a job on the same machine. You can use ps, top or htop to find out, if there are other long running compute jobs running. If it comes to disputes, please contact Prof. Gianni Blatter.
- Do not run more than one job at the same time on the same machine. It usually takes more time than running all jobs one after another.
- Do not use more than 20 % of the machines physical memory. If that's not enough, choose a machine with more RAM. Use ps, top or htop to observe memory usage of your processes.
- Do not open more than 50 files per minute on the file servers (NFS mounts). Do not use /tmp for files bigger than 10 % of the machines memory. Use the local disks (e.g. in /scratch/) instead for such cases.
If you run long running jobs on your personal workstation you do not need to look after keeping the machine usable for other users. Though
- you may not stress your machine so much that it runs out of memory or disk space or system services stop working. If it starts swapping, it will become much slower and inefficient. Use free and vmstat to check swap usage.
- you should use nice anyway, but nice level 1 is already ok. Otherwise the jobs maybe reniced to nice level 10 automatically.
- you should not run several jobs at once unless you know what you are doing. In nearly every case, it's less efficient than running jobs in sequence.
Student and Seminar Workstations
- Long running jobs must not run on the student or seminar machines during courses in the according room. You can find the room reservations on every student and seminar machine in the file /etc/motd which is also shown on every login.