Counting Jobs

From SysadminWiki

You want to know how many jobs are running on your system and to which groups they belong. Occasionally you want to have the breakdown of the numbers per group and per user. The task is pretty simple because it is just matter of parsing qstat. I've doing it so often that I wrote a small script to summarise. The script count_vo_jobs.sh (http://www.sysadmin.hep.ac.uk/svn/fabric-management/torque/jobs/count_vo_jobs.sh) can be downloaded from the repository.

How to use it

It can be run either from the torque server machine or from remote as long as the torque server allows connection from the host you want to run this from. Without any option it returns just the number of jobs and the groups runnning.

count_vo_jobs.sh -H hostname.domain 

Connecting to PBS server: hostname.domain
Running VOs: atlas babar biomed calice hone lhcb pheno
Total number of jobs:     631

Using -v and/or -u it returns the number split by group (in grid world VO where the -v comes from) or user respectively. For example a division per group might report:

count_vo_jobs.sh -H ce02.tier2.hep.manchester.ac.uk -v

Connecting to PBS server: hostname.domain
Running VOs: atlas babar
Total number of jobs:     62

Listing jobs per VO
===================

atlas has 60 Running jobs
atlas 0 Queued jobs 

babar has 2 Running jobs
babar 0 Queued jobs

Notes

It is intelligent enough to exit if qstat cannot be found or if the server doesn't respond.

Requires

Torque: qstat