(Federation Square, Melbourne)
There have been some excellent discussions recently in the storage blogosphere and on Twitter about the concept of Storage Federation among a number of storage people; known by their Twitter IDs as @stuiesav, @storageanarchy, @rootwyrm, @davegraham, @bwhyte, @ianhf, @esignoretti, me (@3parfarley) and others - as the interest continues to increase.
There are two aspects of the discussion that I think are fascinating: first is the role of social media as the means to include customers, vendors and others in an open discussion that typically is conducted privately by a vendor preparing to release a new product or feature, second is the challenge of defining a storage capability with sufficient focus and vendor independence so that is is meaningful. There has been some amount of skepticism about this effort, suggesting that we are predetermined to end up with ambiguous terms that can be interpreted (spun) by anybody (any vendor) to mean anything (our product does it). I'm hopeful the results can be better than that, but subjectivity is the only constant of social media and it's not likely that everyone will like whatever happens.
In general, the language of digital storage has many overlapping meanings, which causes a fair amount of confusion in our industry. I've been dealing with this problem for many years, going back to when I wrote Building Storage Networks and had the challenge of trying to invent generic terms for functions that had been coined by vendors and tied to specific products. My interest in defining Storage Federation goes beyond my role as an employee for 3PAR.
The notion of federated storage has been around for several years, but recently it came to light when Pat Gelsinger of EMC referred to it during a press briefing on March 11. EMC blogger Chuck Hollis wrote about it afterward and there was some chatter, culminating in a blog post by @Stuiesav on April 2, which proposed that the discussion about Storage Federation was just marketing hype attempting to rebrand storage virtualization. EMC's Barry Burke (@Storageanarchy) and I (@3parfarley) both agreed that this was not the case this time around and then in the last couple days the discussion fired up again. The problem with Twitter is the limitations of 140 characters per tweet. It's surprising what can be done with so few characters, but it does have limitations.
This weekend, @ianhf posted in his Grumpy Storage blog echoed @Stuiesav's skepticism and expressed his perspective (as a customer) as to the things he would like to see when new technologies - in this case Storage Federation - are introduced. Here are some of the items from his list (Some of them don't fit our definition exercise because they assume there is a new product being introduced, which is not the case here).
- The specific customer requirements & problems this addresses & justify how
- The use cases this feature / function applies to, and those that it doesn't
- Why & how this feature is different to that own vendor's previous method for solving this problem
- Provide clarity over the non-functional impacts of the feature before, during & after it's use - ie impact on resilience, impact on performance, concurrency of usage etc (including provide up-front details of constraints)
- Naturally you'll also expect me to require TCO & ROI of the feature, and any changes to the models as a result of this feature
The definition of Storage Federation that was kicked around on Twitter is something like: "the transparent, dynamic and non-disruptive distribution of storage resources across self-governing, discrete, peer storage systems." (And yes, I did elaborate a bit on this while I was writing based on bits and pieces from comments I read and further thought of my own.)
The idea is to have multiple storage systems cooperating as a team (as opposed to under the direction of an external entity) to place data in the aggregated storage resources (LUNs or volumes) of all participating members. An example of Storage Federation is how Dell/EqualLogic arrays distribute their volumes over multiple systems. When a new EqualLogic array is added to an iSCSI SAN, the administrator is asked if the array should be placed in the same group as other arrays in the SAN. If this is done, the arrays start splitting their volumes (and workloads) across both arrays.
Five examples of storage federation capabilities are:
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
These are all examples of Storage Federation that Dell/EqualLogic storage systems are capable of today. It's not intended to be an endorsement of their distributed volume manager, I am simply using it as an example of Storage Federation. Saying that, if you look for "Storage Federation" on the Dell/EqualLogic web site, you probably won't find it today because they don't describe it that way, but that doesn't mean that it is not in the product.
That's not it
In-band storage virtualization systems like IBM's SVC can provide all the capabilities listed above for Storage Federation. However, in-band virtualization products govern the behavior of the other storage systems networked to it and Storage Federation involves self-governing, peer systems. Another way of saying it is that Storage Federation does not have functionality in the network between host systems and storage.
File, object and data distribution technologies like EMC's Atmos certainly provide a type of federation insofar as multiple computer systems can access the same data objects from multiple locations that may be separated geographically. However, this capability primarily migrates files (data objects) - and is functionally orthogonal to Storage Federation, which works on volumes. A term like Data Federation is probably more precise than Storage Federation for this sort of capability.
Clustered storage systems, like 3PAR's InServ storage systems or clustered NAS systems are not examples of Storage Federation because they function as single, scalable storage systems, not as discrete storage systems that are networked together. Clustered storage systems can work together in Storage Federations with other clustered or non-clustered storage systems. In that case you could have a Storage Federation of clustered storage systems.
How is this different?
I thought I'd make a little effort to respond to @ianhf's requests.
The main problem Storage Federation addresses the limitations presented by a single storage system, such as capacity, performance and maintenance availability.
Currently, when a new storage system is added to an existing environment, there are server administration tasks that need to be done in order to migrate volumes from an existing storage system to the new one. In addition, there is usually some amount of downtime and/or performance degradation associated with the migration and configuration of new data paths. By contrast, with Storage Federation, a new storage system could be added without having to reconfigure host data paths and the migration would be processed transparently and dynamically, without any downtime or loss of path redundancy while the migration is in process.
Increasing performance for a particular volume in a Storage Federation is somewhat less obvious because the other federated storage systems might not have more performance-resources available to boost performance. For instance, if a new storage system inserted in a Storage Federation does not have more disk drives (spindles) or more flash SSD capacity, there is no guarantee that any volumes moved to it would provide faster performance. There is some likelihood that performance could be improved by moving a volume to a storage system running at lower utilization levels, but maintaining lower utilization levels is not realistic or very cost-effective. All this said, it is possible for a Storage Federation to use the aggregate resources of multiple storage systems to increase performance of a single volume - such as by spanning all the disk drives in the federation and by using the cache of all participating storage systems. An example of products that provides this capability today are the Dell/EqualLogic storage arrays.
The availability of storage systems needing maintenance could be greatly improved by Storage Federation if an existing storage system can be removed from the federation after having it's volumes evacuated (think v-Motion but for storage systems) to other storage systems in the federation. After being removed from the federation, the storage system could be downed to have any sort of maintenance operation done to it without risking the availability of any of the volumes that had previously been on the system undergoing maintenance.Obviously, all of this would take time planning and ensuring the Storage Federation would not be overloaded with data and workloads, but there are probably many customers that would prefer to do maintenance to storage systems when they are offline.
One of the possible exposures of Storage Federation is the increased exposure to system failures. For instance, a Storage Federation that distributes a single volume over three separate storage systems is 3 times more likely to have a failure than a Storage federation that does not allow volumes to span across storage system boundaries. FWIW, this is the main weakness of the Dell/EqualLogic implementation of Storage Federation.
That's all for now
It's late and there is plenty of material here for the grinder. Please feel free to comment in any way; agree, disagree, correct mistakes and ask questions. Thanks for reading.