Musings on multicluster LoadLeveler
Jul. 12th, 2012 08:29 pmContact with my muse successfully resestablish and much accomplished in consequence. I was particularly pleased to get the basics of multi-cluster LoadLeveler sorted, but the filtering threw me slightly: the filter seems to be get run both locally and remotely, but only the remote filter is able to change the contents of the job.
I'm also not entirely sure how to design a cluster metric that correctly reflects the load of a particular system. Is it enough simply to look at the the number of idle steps? Or should this be scaled to either the size of the cluster or, possibly, the number of currently unreserved machines in the cluster? And should jobs pending reservations be included in the idle count? Surely not if they're waiting for a reservation that starts next week. But what if they're waiting for a reservation that starts in ten minutes? Maybe the decision to include or exclude reserved steps should depend on whether the reservation will go active in a time window that matches the expected wall clock of the job?
I'm also not entirely sure how to design a cluster metric that correctly reflects the load of a particular system. Is it enough simply to look at the the number of idle steps? Or should this be scaled to either the size of the cluster or, possibly, the number of currently unreserved machines in the cluster? And should jobs pending reservations be included in the idle count? Surely not if they're waiting for a reservation that starts next week. But what if they're waiting for a reservation that starts in ten minutes? Maybe the decision to include or exclude reserved steps should depend on whether the reservation will go active in a time window that matches the expected wall clock of the job?