A long time favorite of mine, Chris Evans: aka the StorageArchitect, has a new post on thin storage. After he maps out the way disks are virtualized by thin provisioning systems, he writes about storage utilization before moving to open a brief discussion on storage reclamation, something he has been leading the charge on in the storage blogosphere over the last year or so.
He starts the discussion on reclamation by showing how file systems consume storage space and the typical inefficient utilization that follows. He ends the discussion by stating:
In a thin provisioned environment, storage would have been requested only for the blocks with valid data and in this way, a LUN can be less than 100% allocated.
Some readers might be tempted to think that LUN allocation percentage and has an inverse relationship to disk utilization, - in other words, the lower the LUN allocation is, the higher the disk utilization is likely to be. While that may appear to be the case, there are too many variables involved to say there is a mathematical (linear or otherwise) or causal relationship between them. Filing software, such as file systems, databases and virtualization platforms, that allocate capacity from storage, tend to consume more storage over time as users and applications create new, incremental data and files. It follows that the percentage of a thinly provisioned LUN that is allocated is likely going to continue to increase over time. Think of it as storage entropy, which might be some sort of new digital law (the amount of consumed storage in the universe is always increasing) enforced by the extremely high probability that we will continue to create lots more data all the time.
And this is where the idea of 3PAR's thin reclamation, persistence and conversion technologies come into play. These technologies are designed to deal with the ongoing life cycle problems of storage entropy, such as deleted files and in a virtual systems world - deleted systems. But more on that in posts to come.
Returning to Chris' post, it's fitting that he chose a Sesame Street analogy, considering that Sesame Street celebrated it's 40th anniversary last week. He makes the point that storage can share capacity (cookies) among filing systems through reclamation and suggests four kinds of storage cookie monsters; greedy, selfish, nice and saintly.
As much as I love the Sesame Street gang (especially Grover), I'd suggest that the real Cookie Monsters are the filing systems (file systems, databases and virtual system platforms), and that the cookies are storage capacity. Filing systems have been historically bad at sharing the storage cookies they have been given and Symantec's Storage Foundation is truly a breakthrough product designed to share it's capacity cookies with any other filing system in the neighborhood. Saintly behavior, indeed!
But the filing systems are not completely to blame here because storage hasn't provided the means for sharing allocated capacity. As in any system, the hardware and software have to move in something resembling unison. The way storage cookies are shared is by taking capacity that was previously allocated to a filing system and returning it to some form of unallocated storage plate. Using the Sesame Street theme, capacity reclamation is the process of putting storage cookies back on the array's plate where they can be allocated all over again by the monsters in your digital neighborhood.
Did I forget to mention that I LOVE data storage! (munch munch munch munch....)
Interesting. I wonder about the implications of thin provisioning multiple layers...if your storage itself is thin provisioned, and your virtual machine disks are thin provisioned...suppose something interesting happens, like a bug in your configuration management that makes accidentally inflates every virtual machine in your infrastructure by a GB. That could really wreck havoc on your storage infrastructure.
Then, of course, there's your backup infrastructure. I hope you've got dedupe!
Posted by: Matt Simmons | November 18, 2009 at 11:52 AM
A configuration management bug would not be catastrophic because thin provisioning doesn't actually allocate physical capacity until a write occurs. The sort of pathological behavior that is problematic for thin provisioning are situations when a huge, unexpected amount of data is created. Systems and applications don't do this, but people sometimes do when they use corporate storage for things like storing their media files. I wouldn't call that a bug necessarily and corporate usage policies can eliminate these sorts of problems. 3PAR's new thin persistence technology was designed to help manage this sort of scenario, so I should get a blog posted on it soon.
Posted by: marc farley | November 18, 2009 at 12:46 PM
Marc
Thanks for the reference! I have to be honest and admit the cookie reference came out of a conversation with Foskett and Richa Brambley. Stephen suggested the cookie analogy which morphed into Cookie Monster!
Chris
Posted by: Chris M Evans | November 18, 2009 at 03:43 PM