Ganglia

From SysadminWiki

Ganglia is a very popular tool used to display graphs over time. In particular it is often used to monitor load on clusters see http://www-pnp.physics.ox.ac.uk/ganglia/ for example

The software can be downloaded from http://ganglia.sourceforge.net/

You need a web server to display the ganglia graphs which is usually also the gmetad server. This stores the data using rrdtool, the data is collected from the gmon daemon which is run on each client that you wish to monitor. PBS statistics can be queried and displayed by ganglia, for example http://www-pnp.physics.ox.ac.uk/ganglia/specials/pbs.php?h=Oxford+PP+Cluster/ppslgen.physics.ox.ac.uk&q=[all]

Other monitoring utilities such as MonAMI can be used to output data to ganglia.


Ganglia Torque/PBS job monitoring

With a addon it is also possible to monitor the current batch system and all their jobs, running or queued.

To monitor the active jobs in the batch system, you can use the JobMonarch addon. For a example see:

* http://ganglia.sara.nl/addons/job_monarch/?c=LISA%20Cluster
* http://ganglia.sara.nl/addons/job_monarch/?c=GINA%20Cluster
* http://ganglia.sara.nl/addons/job_monarch/?c=Matrix%20Cluster

The JobMonarch addon is developed at SARA, more information is available here: https://subtrac.sara.nl/oss/jobmonarch/