Run model run
May. 13th, 2010 08:30 pmTried to run the UM for the first time in a long while, only for my initial attempts to end in ignominy when my HadGEM3 job failed to submit correctly. Fortunately I was able to call on the expertise of
doctor_squale, who quickly noticed that I'd failed to set my
Once we'd fixed this, I was able to get the job to copy but unable to get it to submit itself to LoadLeveler. This, it transpired, was entirely my fault: I'd deliberately configured my bash start-up scripts to configure a minimal path — one that inevitably excluded the
Which means that I should now have a working executable which, with any luck, should exhibit a pathological memory access pattern, which should cause all the paging space on the host nodes to become exhausted, which should cause the system to hang, which should give me the chance to generate a system dump, which should let me run KDB, which might just give me an idea of what exactly is pathological about the job. But it probably wont. It will probably just return a script error and fail to run, just like the compile job.
DATADIR correctly and that this caused the initial copy to the supercomputer to fail.Once we'd fixed this, I was able to get the job to copy but unable to get it to submit itself to LoadLeveler. This, it transpired, was entirely my fault: I'd deliberately configured my bash start-up scripts to configure a minimal path — one that inevitably excluded the
/usr/lpp/loadl/full/bin directory — for non-interactive remote sessions. With this fixed, I was able to jobs into LoadLeveler where they immediately failed, complaining that they were unable to locate the fcm command. Despite messing with my login scripts I was unable to fix the problem and simply cheated, shamelessly running the compile script interactively.Which means that I should now have a working executable which, with any luck, should exhibit a pathological memory access pattern, which should cause all the paging space on the host nodes to become exhausted, which should cause the system to hang, which should give me the chance to generate a system dump, which should let me run KDB, which might just give me an idea of what exactly is pathological about the job. But it probably wont. It will probably just return a script error and fail to run, just like the compile job.