Reservations: a unfortunate catch
Jul. 9th, 2009 08:35 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I've discovered a nasty little problem with LoadLeveler reservations. A problem which means that somes, just sometimes, the reservation actually prevents the job from running, but which is kind of hard to describe.
Imagine a pair of reservations. Both reservations have at least one node in common between them, but because the first reservation is due to finish before the second is due to start, this is not a problem.
Now, consider what happens when
Thus, under certain circumstances, running a job in a reservation may actually hamper its execution, rather than assist it. And, worst still, because the placement of reservations is dependent on the state of the system, the problem does not necessarily occur repeatably...
Imagine a pair of reservations. Both reservations have at least one node in common between them, but because the first reservation is due to finish before the second is due to start, this is not a problem.
Now, consider what happens when
RESERVATION_CAN_BE_EXCEEDED
is set to true in the LoadL_config
file and a job is submitted into the first reservation. If the job has a wallclock time that is longer that the reservation, but which means that it will finish because the second reservation becomes active, there is no problem. But if the wallclock time of the job means that it will end after the second reservation is due to start — i.e. if the current time plus the wallclock time of the job is larger than the start time of the second reservation — the two resource requests will clash and consequently, the job will not run.Thus, under certain circumstances, running a job in a reservation may actually hamper its execution, rather than assist it. And, worst still, because the placement of reservations is dependent on the state of the system, the problem does not necessarily occur repeatably...