|Table of contents|
Standing reservation for ops VO
The following snippet reserves a cpu for ops and dteam groups. It reserves 1 cpu for 1 task picking it up from the HOSTLIST list (usually 1 node). The reservation lasts all day and its period is INFINITE (it doesn't repeat it could last all day and have a weekly periodicity running every monday for example). Ops and dteam groups have access to the reservation but only if there are no other resources available as indicated by the - after the group name in the GROUPLIST field (see affinity paragraph in the standing reservation (http://www.clusterresources.com/products/maui/docs/7.1.5managingreservations.shtml) chapter of the documentation).
SRCFG[sam] HOSTLIST=<node-name> SRCFG[sam] TASKCOUNT=1 RESOURCES=PROCS:1 SRCFG[sam] PERIOD=INFINITY SRCFG[sam] STARTTIME=00:00:00 ENDTIME=24:00:00 SRCFG[sam] GROUPLIST=ops-
Another example on how to create a standing reservation can be found here (http://wiki.egee-see.org/index.php/SEE-GRID_OPS_role). It uses a combination of GROUPS and CLASSES and the flag SPACEFLEX instead of the - after the GROUP name to allow the scheduler to chose from free resources without reserving explicitely one node if all the resources are occupied.
They are both effective.
Maui documentation on standing reservations can be found here (http://www.clusterresources.com/products/maui/docs/7.1.5managingreservations.shtml)
Assign jobs of a SGM group on a partition of WNs
Warning: the experience described in this section dates from the early days of LCG/EGEE and the suggested scenario probably is not deployable at any production site today.
In INFN Grid (http://grid.infn.it/), but I believe in many sites abroad too, the experimental software is normally located on a server in the /opt/exp_soft dir and projected on the network using the Linux NFSv3 server. So all the SLC3 (http://linux.web.cern.ch/linux/scientific3/) WNs access that directory, that in my site INFN Grid Roma2 (http://grid.roma2.infn.it) is about 100Gb, from the NFSv3 server. During the life of the cluster will arrive Software Grid Manager ( SGM ) jobs that add,modify or delete the /opt/exp_soft subdirs to update their own software installation.
I believe this is a problem for at least 4 reasons:
1) the NFSv3 server is a single point of failure for the Linux Cluster
2) there is a constant network traffic of the same data during the time.
3) you never know what/when/why something is changed on your NFSv3 server, you are not the grid software administrator.
4) the WN disk space is used just for S.O. + middleware, about 4Gb, actually every WN is shipped with at least 100Gb of disk space, so all this space is normally wasted.
Waiting for most common and secure deployment systems, like signed RPMs (http://www.rpm.org/), to manage the experimental software in INFN Grid Roma2 (http://grid.roma2.infn.it) I created a simple MAUI,tripwire (http://www.tripwire.com/products/enterprise/ost/),rsync (http://samba.anu.edu.au/rsync/) "chain" to mirror the /opt/exp_soft on each WN that has room for that and to monitor the filesystem changes on such dir: so the MAUI configuration below show how to assign the jobs of the linux groups alicesgm, argosgm, etc.. on a partition of WNs, grid004 and grid008, that mount in write mode a same NFS remote directory /opt/exp_soft from a secure ( but never 100% secure! ) Linux NAS (http://en.wikipedia.org/wiki/Network-attached_storage) atlas2. Note the use of the symbol ^ to impose the WNs partition in MAUI. There is hence a tripwire (http://www.tripwire.com/products/enterprise/ost/) cron job on atlas2 that check the changes on the dir and send me an e-mail with the report. When I note important changes I command a global rsync (http://samba.anu.edu.au/rsync/) synchronization between the WNs and atlas2. So all my 16 WNs has their own /opt/exp_soft mirrored and the data access is hence more fast because it's now disk-based, if the NFSv3 server atlas2 is off all the grid computations can continue ( that's not true for grid004 and grid008 ). In my site the changes on /opt/exp_soft happen about 1 in 15 days, sometime 1 in a month, so the numbers of jobs failed during the synchronization operations is in my opinion tolerable.
I'm quite sure that global and parallel filesystem like GPFS (http://www-03.ibm.com/systems/clusters/software/gpfs.html) or Lustre (http://www.clusterfs.com/) can solve this scenario in a better way, I'm actually gathering informations about those, but to introduce quickly and easily more security and more performances on your site without change your filesystem you could apply the "chain" described and finally to use the tens of Gb actually wasted on your WNs.
... NODECFG[grid004] PARTITION=nfs NODECFG[grid008] PARTITION=nfs
GROUPCFG[dteamsgm] PLIST=nfs GROUPCFG[infngridsgm] PLIST=nfs GROUPCFG[opssgm] PLIST=nfs
GROUPCFG[alicesgm] PLIST=nfs^ GROUPCFG[argosgm] PLIST=nfs^ GROUPCFG[atlassgm] PLIST=nfs^ GROUPCFG[babarsgm] PLIST=nfs^ GROUPCFG[biomedsgm] PLIST=nfs^ GROUPCFG[biosgm] PLIST=nfs^ GROUPCFG[cdfsgm] PLIST=nfs^ GROUPCFG[cmssgm] PLIST=nfs^ GROUPCFG[compassitsgm] PLIST=nfs^ GROUPCFG[compchemsgm] PLIST=nfs^ GROUPCFG[egridsgm] PLIST=nfs^ GROUPCFG[eneasgm] PLIST=nfs^ GROUPCFG[esrsgm] PLIST=nfs^ GROUPCFG[griditsgm] PLIST=nfs^ GROUPCFG[ingvsgm] PLIST=nfs^ GROUPCFG[lhcbsgm] PLIST=nfs^ GROUPCFG[libisgm] PLIST=nfs^ GROUPCFG[magicsgm] PLIST=nfs^ GROUPCFG[opssgm] PLIST=nfs GROUPCFG[pamelasgm] PLIST=nfs^ GROUPCFG[theophyssgm] PLIST=nfs^ ...