sawyl: (Default)
[personal profile] sawyl
One of the perpetual nuisances of being a Big Iron shop — especial one ruled by bureaucracy and governed by the counting of the beans — is need to main a vast and baroque accounting suite.

Standard Unix accounting, needless to say, doesn't even come close. The process accounting records only save a handful of possible chargeables and there's no way to charge different commands to different projects. In fact, the base accounting tools don't even support the concept of projects. Super-UX, at least, provides support for projects but lacks the consolidation and reporting facilities of CSA under UNICOS.

And CSA, for all it's flexibility, wasn't exactly robust and reliable. When the accounting cycle broke, as it regularly did, the problems would take hours to fix, and if they weren't fixed by the time of the next run at midnight, the suite would garble two days worth of data together, doubling the size of the mess. CSA also suffered from a particularly annoying bug whereby the accounting suite would repeatedly crash if the system uptime exceeded six months — not a big problem with early T3Es — and the only way to clear the problem was to reboot the system. Yes, the only way to fix the accounting suite, the thing that allowed every micro-second of CPU to be precisely counted, was to take the machine out of production for an hour to allow the machine to be IPLed.

Given the shortcomings in the Super-UX accounting suite, our current solution to the problem is to use Perl to unpack the process accounting records from each machine, consolidate them into a single CSA-style front end format record, archive the raw data and post process the FEF information into something for the end users of the system. Typically, this involves consolidating all the process records for a single batch job — across one or more nodes — into a single line, with each entry containing details of both consumable and requested resources. These entries are then used to generate an efficiency rating for each job — typically, the ratio of CPU time consumed against the amount of time the CPUs were assigned to the job — which allows the scientists to pick up on poorly performing jobs and apply peer pressure to encourage their colleagues to stop wasting resources.

Or that's how things should be. In practice, it seems, most users don't really have the time to churn through the list of jobs every day and pick out their own worst offenders. I can't really say that I blame them — after all, it's pretty dispiriting to be confronted with hard evidence of your own shortcoming — but it means that the system isn't working and the number of idle cycles is starting to creep up.

So, in an attempt to fix the situation, I've knocked up a piece of code using mod_python to respond to accounting queries. So far it's pretty basic, but it allows a user to filter out their own jobs and to restrict the results to a particular work class, and seems to work pretty well.

Now all I have to do is decide what to do about the look and feel. Most of my previous web-bases scripts have been used perl and have relied on CGI.pm to do the bulk of the formatting. Not having found a python module with the same degree of functionality, it looks like the best course of action might be to use something like CheetahTemplate to care of the output formatting.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 4th, 2026 07:50 pm
Powered by Dreamwidth Studios