sawyl: (Default)
[personal profile] sawyl
After months of talking about it, I've finally implemented a set of LoadLeveler prolog and epilog programs. In the process I've discovered:

  • any output generated by the prolog gets stamped on by the job itself
  • the epilog has to be run as the job user in order to update the output files
  • information about prolog or epilog problems is written to the StarterLog of the node where the master task executed
  • an easy solution to the epilog output problem is to write the program in perl or C an dup stdout and stderr to $LOADL_STEP_OUT and LOALD_STEP_ERR because this also redirects any child processes
  • not all LOADL variables are available on all nodes of a multi-node job (the important ones are usually only set on the master)
  • the programs are run by the job starter daemon and are unaffected by environment changes in the job script
  • that the precise details of writing prologs and epilogs doesn't seem to be terribly well documented

And, most importantly, that it doesn't matter if the epilog programs don't work but if the prolog doesn't work, the system will start to trash the jobs until the problems are fixed...

Perl example of job user epilog?

Date: 2010-02-16 09:18 am (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawljoV9ywK9uiLVYawXlotiMVEyCqjnvSC8 (from livejournal.com)
Hi,

I have managed to write a user epilog as a shell script which adds some stuff to the user's output. This works, but I would rather do it in Perl. However, I can't write to the user's output file from the Perl script. Have you got a Perl example which you could post?

Thanks,

Loris

Re: Perl example of job user epilog?

Date: 2010-02-17 10:44 pm (UTC)
From: [identity profile] sawyl.livejournal.com
Although I don't have the examples to hand, I remember having a few minor problems while trying to get the prolog/epilog scripts running. Here are a few possibilities that immediately spring to mind.

If the script runs interactively but fails when run from LoadLeveler, it might be worth checking the StarterLog on the batch node to see if there are any errors. I remember one of my early scripts failed silently because the compute node environment was slightly different to the interactive node where I'd done the initial development causing the interpreter to fail to compile the script.

I also found that in some cases, when the user had specified "# @ initialdir", the output redirect didn't work correctly. In the end, I forced an absolute path by doing something like the following:

$out = $ENV{LOADL_STEP_OUT};
$out = "$ENV{LOADL_STEP_INITDIR}/$ENV{LOADL_STEP_OUT}" if(substr($out, 0, 1) ne "/")

And then using this to duplicate the output file descriptor:

open(OUT, ">> $out");
open(STDOUT, ">&OUT");

Other than that, the only thing I can suggest is to check that the epilog script is being run from JOB_USER_EPILOG and not JOB_EPILOG. I'm sure this isn't the case, since you've managed to get a shell script working, but I mention it because I initially set the scripts up to run as the LoadLeveler user and then couldn't understand why the output files weren't being updated.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 4th, 2026 01:27 pm
Powered by Dreamwidth Studios