sawyl | LoadLeveler, preemption and reservations

I've had a surprisingly productive day testing out some of IBM's suggestions for how we might improve our LoadLeveler configuration. Of these, the most promising seems to be the recommendation that we enable a limited form of preemption by:

Setting PREEMPTION_SUPPORT = full;
Setting DEFAULT_PREEMPT_METHOD = vc;
Setting RESERVATION_PRIORITY = high;

This gives the reservations a higher priority than the running work and, if there are insufficient resources for the reservation when it goes into the SETUP state, the jobs running on the nodes assigned to reservation will be cleared out using the default preempt method. In this case I opted for vacate, which kills the job and requeues it, rather than hold or suspend, because we don't have enough paging space to allow two full memory sized jobs to coexist on a node.

Although I think that preemption may well fix most of our problems, I don't think it is a magic bullet. For one thing it doesn't, by itself, grant us any control over which jobs are preempted — something our current script does for us. But it might be possible to deal with this using another suggestions: that we split the system into two separate pools and use dummy jobs to limit the reservations to a single pool, while allowing the rest of the work to run in either pool.

I think we could probably better this by allowing all lower priority work to run in either pool, while restricting high priority work to the non-reservation pool. Something we could probably do by checking the priority of the job in the filter as it is submitted and by adding a # @ pool = parameter as it passes through.