GPFS: day 2
Nov. 14th, 2008 02:47 pmExpanding on yesterday's discussions of multi-cluster GPFS, talked about the ability to differentiate the underlying network transport used for file system traffic. For if two clusters are linked together with gigabit but one of them has the ability to use InfiniBand internally, GPFS will not use the faster network unless explicitly told to do so with a subnet pattern.
We then talked about snapshots, the creation of which apparently causes IO to freeze until the image has been created, and then moved on to talk about GPFS and Tivoli Storage Manager. The trick to backup being, apparently, to put the server software on a system with fibre channel access to a tape silo and to put the clients on the GPFS IO nodes, with data to be backed up sent over the network to the TSM server. It may also be possible to use TSM as the backing store of an HSM by defining the tape storage as the lowest level storage pool in GPFS and creating a policy to move old data down the tiers to tape and clearing out the blocks on tape.
We then wound the day up with a discussion of GPFS filesets, which allow hierarchies of quotas to be defined, followed by a general discussion of the steps required to dead-start an HPC cluster. I found the whole thing intriguing. I'm used to a few, large and balky entities that, after the withdrawal of power — scheduled or otherwise — require close attention from a site engineer before they can be brought up. So the cut over to a system made from relatively simple — and, hopefully, reliable — with a series of complex interdependencies, is going to take some time to get used to. But it least it promises for an interesting life...
We then talked about snapshots, the creation of which apparently causes IO to freeze until the image has been created, and then moved on to talk about GPFS and Tivoli Storage Manager. The trick to backup being, apparently, to put the server software on a system with fibre channel access to a tape silo and to put the clients on the GPFS IO nodes, with data to be backed up sent over the network to the TSM server. It may also be possible to use TSM as the backing store of an HSM by defining the tape storage as the lowest level storage pool in GPFS and creating a policy to move old data down the tiers to tape and clearing out the blocks on tape.
We then wound the day up with a discussion of GPFS filesets, which allow hierarchies of quotas to be defined, followed by a general discussion of the steps required to dead-start an HPC cluster. I found the whole thing intriguing. I'm used to a few, large and balky entities that, after the withdrawal of power — scheduled or otherwise — require close attention from a site engineer before they can be brought up. So the cut over to a system made from relatively simple — and, hopefully, reliable — with a series of complex interdependencies, is going to take some time to get used to. But it least it promises for an interesting life...