Of prologs and epilogs
Jul. 1st, 2009 09:44 pmAfter months of talking about it, I've finally implemented a set of LoadLeveler prolog and epilog programs. In the process I've discovered:
- any output generated by the prolog gets stamped on by the job itself
- the epilog has to be run as the job user in order to update the output files
- information about prolog or epilog problems is written to the StarterLog of the node where the master task executed
- an easy solution to the epilog output problem is to write the program in perl or C an dup stdout and stderr to
$LOADL_STEP_OUTandLOALD_STEP_ERRbecause this also redirects any child processes - not all
LOADLvariables are available on all nodes of a multi-node job (the important ones are usually only set on the master) - the programs are run by the job starter daemon and are unaffected by environment changes in the job script
- that the precise details of writing prologs and epilogs doesn't seem to be terribly well documented
And, most importantly, that it doesn't matter if the epilog programs don't work but if the prolog doesn't work, the system will start to trash the jobs until the problems are fixed...
Perl example of job user epilog?
Date: 2010-02-16 09:18 am (UTC)I have managed to write a user epilog as a shell script which adds some stuff to the user's output. This works, but I would rather do it in Perl. However, I can't write to the user's output file from the Perl script. Have you got a Perl example which you could post?
Thanks,
Loris
Re: Perl example of job user epilog?
Date: 2010-02-17 10:44 pm (UTC)If the script runs interactively but fails when run from LoadLeveler, it might be worth checking the StarterLog on the batch node to see if there are any errors. I remember one of my early scripts failed silently because the compute node environment was slightly different to the interactive node where I'd done the initial development causing the interpreter to fail to compile the script.
I also found that in some cases, when the user had specified "# @ initialdir", the output redirect didn't work correctly. In the end, I forced an absolute path by doing something like the following:
$out = $ENV{LOADL_STEP_OUT};
$out = "$ENV{LOADL_STEP_INITDIR}/$ENV{LOADL_STEP_OUT}" if(substr($out, 0, 1) ne "/")
And then using this to duplicate the output file descriptor:
open(OUT, ">> $out");
open(STDOUT, ">&OUT");
Other than that, the only thing I can suggest is to check that the epilog script is being run from JOB_USER_EPILOG and not JOB_EPILOG. I'm sure this isn't the case, since you've managed to get a shell script working, but I mention it because I initially set the scripts up to run as the LoadLeveler user and then couldn't understand why the output files weren't being updated.