sawyl: (Default)
[personal profile] sawyl
After months of talking about it, I've finally implemented a set of LoadLeveler prolog and epilog programs. In the process I've discovered:

  • any output generated by the prolog gets stamped on by the job itself
  • the epilog has to be run as the job user in order to update the output files
  • information about prolog or epilog problems is written to the StarterLog of the node where the master task executed
  • an easy solution to the epilog output problem is to write the program in perl or C an dup stdout and stderr to $LOADL_STEP_OUT and LOALD_STEP_ERR because this also redirects any child processes
  • not all LOADL variables are available on all nodes of a multi-node job (the important ones are usually only set on the master)
  • the programs are run by the job starter daemon and are unaffected by environment changes in the job script
  • that the precise details of writing prologs and epilogs doesn't seem to be terribly well documented

And, most importantly, that it doesn't matter if the epilog programs don't work but if the prolog doesn't work, the system will start to trash the jobs until the problems are fixed...

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 31st, 2025 12:48 am
Powered by Dreamwidth Studios