Following two presentations made by Ian Bird at the Fall 2006 Hepix and at the
WLCG Management Board last October we have set up a System Management Working Group (SMWG) of sys admins from Hepix and grid sites to address the fabric management problems that HEP sites might have. The goal of the group is to setup a repository of management tools and monitoring sensors and produce documentation in the form of a cookbook and HOWTOs for the benefit of the HEP community. The group will cooperate with Grid Services Monitoring Working Group being setup within the WLCG/EGEE projects (for completeness there is also a
System Analysis Working Group that will cover the other side of the monitoring spectrum, the application one) in setting up a comprehensive monitoring framework that will improve the robustness of grid sites in particular and will help HEP sites in general. More information about the 3 groups can be found at: WLCG Monitoring Working Groups. Specifically the SMWG mandate can be found here: SMWG Mandate
As stated in the mandate the SMWG goal is not to implement new tools but to share what is already in use at sites according to existing best practices. We are aware that some sites are already publicly sharing their tools and sensors and some other sites do write very good documentation and share it. The aim is to extend this to a general practice and in a more organised way and avoid the duplication of effort that occurs when system administrators are solving mostly the same problems over and over.
This site has been setup to collect the information, the tools sys admin want to share. There is a wiki, a subversion repository and normal WEB pages. The access to the site is via x509 certificates and uses gridsite in the background to control the ACLs.
Support: If you experience any problem or you would require access to the repositories please open a GGUS ticket for UKI-NORTHGRID-MAN-HEP.
|System Administration for HEP|