sawyl | Sep. 7th, 2012

Having accumulated a staggering 22TB of LoadLeveler data over the last few months, I decided that the time had come to do a bit of housekeeping. After a couple of tests showed that bzip2 seemed to be able to manage something like 95 per cent compression, I decided to find a way to exploit the obvious data parallelism inherent in my problem.

Fortunately, python's multiprocessing module makes this trivial. Essentially, all I needed to was the following:

import os
from multiprocessing import Pool
from subprocess import call

def compress(target):
    call(["bzip2", target])

if __name__ == "__main__":
    Pool(4).map(compress, os.listdir("."))

This creates a four process pool of worker tasks, generates a list of files in the current directory and passes them via the map() call to the compress() function, allowing the Pool object to handle the assignment of work to the tasks. Obviously, my final code wasn't really as simple as the fragment above. I added a couple of extra features to allow me to exclude already compressed files from my list of targets and to limit the maximum number of files processed in any one run to a reasonable number.

Once I'd done this, I set up a multi-step LoadLeveler job to traverse a subdirectory and process a thousand files before resubmitting itself to handle the next chunk. Once the compression program is no longer able to find files to compress, the job advances to the next directory and resubmits itself. (My first guess at a base case — when no more uncompressed files remained in the directory — turned out to be too naive: having decided not to use the --force option to bzip2 my script was unable to replace the partially compressed files left over by early batch jobs that had been killed off after hitting their wall-clock limit, causing it to go into a resubmission loop. In addition to adding a force option, I changed the job to check whether the contents of the current directory before and after the compression script, causing it to advance to the next directory whenever the two listings were identical).

Fortunately, the space savings seen to have more than justified the investment of my time: I've managed to shave 4TB off my total disk usage in the last 24 hours.

Following Alison Flood's enthusiastic piece on the Guardian books site last year, I've finally got around to reading the first couple of parts of Jack Vance's collection The Dying Earth. Although I liked the first few stories, I felt that the whole thing really hit its stride with the wonderfully comic Cugel the Clever stories that combine brilliant writing with mordant humour.

Contrary to his moniker Cugel is only ever occasionally wily, constantly getting himself in hopeless scrapes. He gets sent halfway round the world to recover a contact lens after being caught burgling Iucounu the Laughing Magician. He gets tricked into volunteering for a job as watchman that requires him to live at the top of a 500 foot high tower. He behaves appallingly towards his fellow travellers. He casually barters away his travelling companion — a woman who found herself expelled from her home city on account of his actions — in exchange for some completely unnecessary travel advice. He gets a corrupt priest to fake a miracle to persuade a group of 50 pilgrims to cross a hostile desert, something that results in all their deaths, just because he is in a hurry to get home.

In short, Cugel is not much of a hero. But thanks to Vance's deadpan delivery and stylish prose, and Cugel's own unswerving belief in his own cleverness, the whole saga is incredibly funny. Here, for example, is a characteristic Cugel moment: hungry after failing to scam lunch out of local wizard, he encounters a unknown creature:

A new thought occurred to Cugel. The creature displayed qualities reminiscent of both coelenterate and echinoderm. A terrene nudibranch? A mollusc deprived of its shell? More importantly, was the creature edible?

Shortly after eating it, he discovers that the creature was actually the totality of the universe, something the wizard has been trying to attract for the last 500 years, and finds himself sent a million years into the past to try to track it down again.

Delightful.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tales of a Fourth Grade Nothing

Sep. 7th, 2012

Sep. 7th, 2012

Lazy data parallelism

The Dying Earth

Profile

August 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags