Of prologs and epilogs
Jul. 1st, 2009 09:44 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
After months of talking about it, I've finally implemented a set of LoadLeveler prolog and epilog programs. In the process I've discovered:
- any output generated by the prolog gets stamped on by the job itself
- the epilog has to be run as the job user in order to update the output files
- information about prolog or epilog problems is written to the StarterLog of the node where the master task executed
- an easy solution to the epilog output problem is to write the program in perl or C an dup stdout and stderr to
$LOADL_STEP_OUT
andLOALD_STEP_ERR
because this also redirects any child processes - not all
LOADL
variables are available on all nodes of a multi-node job (the important ones are usually only set on the master) - the programs are run by the job starter daemon and are unaffected by environment changes in the job script
- that the precise details of writing prologs and epilogs doesn't seem to be terribly well documented
And, most importantly, that it doesn't matter if the epilog programs don't work but if the prolog doesn't work, the system will start to trash the jobs until the problems are fixed...