Needles in data haystacks
Jul. 28th, 2008 09:22 pmDisc scrubbing is a total pain. Especially if you've got thousands of discs to clean. But in some ways, if you've got a lot of discs and relatively small amount of sensitive data, the problem appears to become less sever.
Consider a situation where a file system is made up eight logical stripes, with each stripe made up of five discs running in a RAID5 configuration. When an item of data is written to the file system, it is split into chunks across the arrays and then split blockwise across the discs.
In order to recover any information from this file system once it has been overwritten, an attacker would need to read the disc using, say, a magnetic force microscope, rebuild the blocks in the correct order, reassemble the stripes. Then, if only a minority of the data on the file system is sensitive, the attacker must then winnow the data to separate the restricted wheat from the unclassified chaff. This seems to be a fairly daunting task, even if the attacker already knows the form of the sensitive data on disc.
But how much harder would it be for an attacker when confronted, not with 30 or 40 unlabelled discs, but with 1,500? And what if the amount of sensitive data does not scale in line with the number of discs but remains constant?
Consider a situation where a file system is made up eight logical stripes, with each stripe made up of five discs running in a RAID5 configuration. When an item of data is written to the file system, it is split into chunks across the arrays and then split blockwise across the discs.
In order to recover any information from this file system once it has been overwritten, an attacker would need to read the disc using, say, a magnetic force microscope, rebuild the blocks in the correct order, reassemble the stripes. Then, if only a minority of the data on the file system is sensitive, the attacker must then winnow the data to separate the restricted wheat from the unclassified chaff. This seems to be a fairly daunting task, even if the attacker already knows the form of the sensitive data on disc.
But how much harder would it be for an attacker when confronted, not with 30 or 40 unlabelled discs, but with 1,500? And what if the amount of sensitive data does not scale in line with the number of discs but remains constant?