sawyl: (Default)
[personal profile] sawyl
Tried to run the UM for the first time in a long while, only for my initial attempts to end in ignominy when my HadGEM3 job failed to submit correctly. Fortunately I was able to call on the expertise of [livejournal.com profile] doctor_squale, who quickly noticed that I'd failed to set my DATADIR correctly and that this caused the initial copy to the supercomputer to fail.

Once we'd fixed this, I was able to get the job to copy but unable to get it to submit itself to LoadLeveler. This, it transpired, was entirely my fault: I'd deliberately configured my bash start-up scripts to configure a minimal path — one that inevitably excluded the /usr/lpp/loadl/full/bin directory — for non-interactive remote sessions. With this fixed, I was able to jobs into LoadLeveler where they immediately failed, complaining that they were unable to locate the fcm command. Despite messing with my login scripts I was unable to fix the problem and simply cheated, shamelessly running the compile script interactively.

Which means that I should now have a working executable which, with any luck, should exhibit a pathological memory access pattern, which should cause all the paging space on the host nodes to become exhausted, which should cause the system to hang, which should give me the chance to generate a system dump, which should let me run KDB, which might just give me an idea of what exactly is pathological about the job. But it probably wont. It will probably just return a script error and fail to run, just like the compile job.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 8th, 2026 12:43 pm
Powered by Dreamwidth Studios