Lee Johns is one of my new co-workers at HP and he gets himself into all sorts of interesting things - like the upper left corner of this video, where he talks about data sprawl, dedupe and HP's StoreOnce technology.
He does an excellent job highlighting how StoreOnce technology is portable and can be located in any location in the infrastructure with the ability to transfer deduped data without the necessity of rehydrating and deduping it.
SNW in Dallas was very educational and fun - an excellent show and there are some informative Infosmack interviews in the works that people might want to check out.
The Cloud Storage Initiative is developing the means to create and transfer metadata for data stored in the cloud. This is a huge deal because it promises to alleviate one of the largest concerns about cloud storage, which is portability of data among different cloud storage service and IAAS providers.
The Green Storage Initiative introduced a new power efficiency program called SNIA Emerald, which is providing power consumption measurements for storage. There are many challenges involved with this sort of work and I give the folks working on this at SNIA a lot of credit for making progress through such a thorny topic. SNIA Emerald is an excellent example of how SNIA is providing leadership for the entire storage industry.
This is an older video I shot from late last year with Chakri Avala from Symantec and and Karl Swarz for 3PAR demoing how thin reclamation for Symantec Storage Foundation works. Storage demos like this are a bit like watching grass grow, but storage admins will get the idea of how file system-integrated reclamation works.
Derek Seaman posted in his blog yesterday about capacity thinning, zero reclamation technologies and included a test of 3PAR's Thin Persistence software. In his post he lists the steps he took to run the test, including the setup and final results. Please go read what he wrote, but I thought I'd post his summary:
This test proved that the 3PAR zero reclaim feature worked as advertised, happens in real time, and take very little effort to use. The same process would work for a virtual machine as well. If I was using the Veritas Storage Foundation I would not have to use the sdelete command and it would be fully automated. Hopefully they will work with Microsoft and VMware to support a fully automatic and native method to reclaim the deleted space. Until then, you can run sdelete from time to time to drop those extra pounds from your fat LUNs.
3PAR sees thinning technology as a strategic advantage that we are committed to advancing in the industry with software partnerships. Examples of progress on these fronts include the automated thinning for Oracle ASM and implementing VMware's VAAI Block Zeroing, which turns a thick volume into a thin one on 3PAR storage with Thin Persistence.
I wrote about the fact that we already had zero detect technology in our product, which is useful for the new Full Copy command because it allows customers to remove zeroed data from clones when they are created and return them to array free space.
The discussion became a bit confused when Chad interpreted what I was saying as pertaining to Block Zeroing.
Block Zeroing and Full Copy are different aspect of the VAAI API. The intent of block zeroing is to reduce the amount of CPU effort and storage traffic required to write zeroes across an entire EagerZeroThick (EZT) VMDK when it is created. The intent of Full Copy is to make clones of VMs quickly without consuming I/O bandwidth. Things get interesting when you start thinking about making a full copy of an EZT VMDK that was created using VAAI with block zeroing - but I'll discuss that later.
I also want to clarify what zero detection technology is. 3PAR T and F class arrays have zero detection technology, which is enabled by Thin Persistence software, that recognizes zeroed blocks as they are read by the array and returns them to the array's free pool. Any read requests made to these block addresses will return a zero value. In essence it is dedupe for zeroes.
However, Zero detection is not needed when an EZT VMDK is created using the VAAI plug-in because the array will recognize the intent of the command and not write the zeroes. In other words, the VMDK will only contain a very small amount of reserved space when it is created. Again, any attempts to read blocks in those ranges will return zero values. Zero detection is effectively bypassed during the creation of the EZT VMDK.
The exception to this behavior is when the EZT VMDK being created is written to a thick volume - in that case the array will write zeroes across the entire VMDK.
The remaining cases for the creation of EZT VMDKs on 3PAR arrays occur when the VAAI is not used. For a thick volume, the entire VMDK has zeroes written to it. Thin volumes not using zero detect also have zeroes written over the entire VMDK. Thin volumes with zero detect will not have zeroes written to them and will contain only a small amount of reserved space.
FWIW, the reserved space is used as instantly-available capacity that can be allocated on-demand when writes start coming into the volume. 3PAR arrays always "read ahead" free space to improve the performance of thin provisioning.
The next bit here could be a bit thorny, so clear your head. The matter of making a Full Copy of an EZT VMDK to a thinly provisioned volume was something Chad said was not allowed. My assumption here is that the type of thin provisioning used makes a big difference.
For instance, if you are using TP from VMware, I could see where they would not allow a full copy to be made. The problem is that the full copy will return all the zero values for the source VMDK, whether or not those zeroes were ever actually written - and write them to the target TP volume. In other words, the target could be much larger than the source. In the VMware TP scheme, this could make for problems in a hurry if you were making a bunch of clones this way.
In contrast, if you were using a 3PAR array with zero detection, the Full Copy of the source VMDK would return zeroes for the entire VMDK, but the zero detection would strip them out again as the target was being written. You could make as many clones as you wanted this way, knowing that the physical capacity they consume would be a multiple of the physical capacity consumed by the source VMDK. In other words, you wouldn't have to worry about virtual zero bloat making a mess of your VMFS volume.
One of the big differences between 3PAR's zero detection technology and other vendors zero-reclaim technology is that 3PAR's process is real-time-on-ingestion as data comes into the array, whereas zero-reclaim works in a post processing fashion after the zeroes have already consumed disk space. This could be a significant difference in many cases because the post-processing method has the potential to create unexpected capacity-full conditions before the zero-reclamation process even has a chance to start.
Here's a video that TechTarget produced for us with one of our customers, Priceline.com.
Here are a few highlights from the video:
Priceline.com was one of the first e-commerce players to adopt virtualization. That may account for why the company's IT organization is known for for it's high availability and ability to adapt quickly to changes in the market. Given the fact that their business has a broad value-based appeal, their IT organization works very hard to get the best rate of return for their capital expenditures.
3PAR storage allowed them to increase their storage capacity over 400% over the last four years while reducing the administrative load required to manage it all. Ron Rose, ex-CIO at Priceline (now on the Sr. Management Team at Dell) said that they were able to decrease the data center footprint 50% during that time. Mr. Rose estimated that they were able to reduce the deployment of approximately 100 physical servers and their associated footprint costs, which were equivalent to 106 acres of trees 310 tons of hydrocarbons per year.
I caught up with Mark Cravotta from Datapipe recently at a 3PAR event in Las Vegas. He's a high energy person who is having a lot of fun growing Datapipe's hosting and cloud computing services as well as helping to manage its expansion around the globe.
Datapipe is a 3PAR Cloud Agile partner and customer who uses our products throughout their line for primary multi-tenant storage, data snapshots, remote replication and all aspects of disaster recovery.
In addition to being customer-driven, Datapipe is also committed to being a leader in green utility computing by reducing the carbon footprint of it's data centers through power purchases from green power producer, Constellation NewEnergy.
London-based Ultraspeed has succeeded in the managed hosting business 13 years by being smart, opportunistic and service-oriented. A lot has changed during that time, especially customer expectations for uptime and how much customers rely on their hosting providers to respond quickly when needed. Web sites that are lucky enough to "go viral" can be a disaster if the hosting company's infrastructure is unable to adjust rapidly enough to meet demand.
In March, Ultraspeed opened their second data center in Amsterdam implementing a modular infrastructure design including multi-tenant 3PAR storage, VMware, Extreme Networks switches and customized servers. The highlight of their Amsterdam site is the ability to offer bi-directional DR services between London and Amsterdam using 3PAR's Remote Copy software. Ultraspeed is a member of 3PAR's Cloud-Agile program.
In this interview, conducted in February 2010, Jordon Gross, CEO of Ultraspeed and Michael Shanks, CTO joined us for coffee near their offices and talked about their company's history, it's technology, the challenges they face and how they expect things to shape up in the years to come.
A couple weeks ago, one of the major storage vendors had two major problems to resolve after one of their arrays suffered a firmware bug-induced failure at one of their cloud (email) service provider customers. They had to:
Help the customer get back to normal service levels after they had become unacceptable.
Confront a public relations problem after it was exposed by a leading storage publisher.
Meanwhile, their service provider customer had four major problems to resolve:
Get service levels back to acceptable levels.
Communicate to their customers what the problem was and how it was being addressed.
Re-engineer a solution to avoid the same happening again.
Credit customers for not delivering against SLA terms.
A vendor employee tried to address their public relations problem this way in his blog:
"OK, I'll take the blame for this -- sort of. We pride ourselves in putting a lot of thought into our customer designs. I'd argue that we're really, really good at it as well.
But not everyone is 100% sure of how their application will grow over time -- unfortunately, we're not psychics. And, let's be honest, not everyone necessarily wants to pay for redundancy we like to put into our designs.
We don't always get to directly engage all the time, either -- with products such as the (blanked out), most of this stuff moves through the channel. Somebody calls up one of our partners, says that they want to buy one of our products, and one gets sold -- and a lot of product gets sold that way."
I understand the desire to explain how messes become messy, but I'm not sure why he felt the need to speculate that his company's business partners or that their customer's budget were key elements of the problem. That is tantamount to saying, "All of our (blanked out) customers could have the same thing happen to them too." Anybody who has ever been close to one of these melt-downs knows there are many variables involved - including vendors underbidding each other and shaving elements from their bid in order to win the business.
From a distance, it looks like the vendor's response to the customer was good, although there apparently were some issues with failure notification from the array when the event occurred. I wouldn't call these sorts of things "Perfect Storms", but there are unfortunate scenarios where multiple things go awry. All vendors have these sorts of bad days, which serve as painful learning experiences. Unfortunately for customers, it's one of the ways vendors improve their customer support processes.
The customer also wrote in his blog, explaining the situation to their customers:
"Our SAN vendor analyzed the system logs for the event and determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. Our vendor performed an emergency upgrade. The newer version of firmware includes a fix for the bug. We are taking additional corrective actions to make certain that there is enough spare capacity on the SAN. This will assure it performs without performance degradation in the event of a single hardware failure."
The reparation sounds reasonable, but it's not what I would call best of breed either. I'll explain why in the remainder of this post.
The old trusted dual controller just can't keep up
The explanation the service provider gave to their customers was only half correct. Yes, the failure in one controller was due to a firmware bug -and yes, all vendors find out about some of them at customer sites - but the inability of the surviving controller to handle the workload was another matter altogether.
The major defect of all dual controller designs for service provider applications is the uselessness of write cache when operating in degraded mode on a single controller.
When a dual controller array has a controller failure, all traffic is failed over to the surviving controller. However, this controller can't afford to place writes in cache because if this controller also fails any un-flushed writes in cache would be lost- making the recovery process all the more painful. As a result, the throughput of the controller degrades significantly because writes now take several orders of magnitude longer to process as each write must be completed at the physical disk level, instead of in fast cache memory. When you consider the sort of read/write ratios involved with an email application (heavy writes), it's not surprising to hear that it took 32 hours for the system to get caught up. I suspect that if the surviving controller had been able to use write cache, the customer might have experienced some amount of service level problems, but not nearly as bad as they suffered.
Write performance during array component failures is an important point that many customers give insufficient weight to when making their purchases. Public service providers certainly need to understand this. The exact same scenario - controller failure and subsequent drop in service levels - could certainly happen to a traditional data center customer, but the ramifications of this scenario are not as ugly as they are for a multi-tenant public service provider.
This case is a perfect example of how an older architecture is incapable of meeting the requirements of the new cloud service business model. If you are a cloud service provider reading this and wondering if you might have a similar exposure to a controller failure (including 3PAR customers with dual -controller arrays), my advice is to review what you have and start thinking about what you should expect if you have a controller failure and how you might want to deal with it on both a short-term and long-term basis. Best of breed cloud storage should not include dual controller arrays.
Their solution is to buy more and utilize it less
One of the identified corrective actions is having "enough spare capacity on the SAN", which in this case involves installing a second array. Without knowing the inside scoop, it looks like the idea is to split the workload across the two arrays so that if a controller failure occurs in either array, the performance drop won't be as noticeable. The array that doesn't suffer the failure will keep working as expected and the array that has the failure will only have half the load to deal with.
There are two primary problems with this "fix"
Performance will still suffer on the array with the controller failure
The I/O load will continue to increase over time
You are always going to have performance degradation of some sort when you can't use write caching, unless you are only reading data - which isn't the case here. It is flat out wrong to assume that a performance problem will not occur. Regardless, with the new two-array SAN, whichever system has the controller failure should be able to get caught up much faster than the 32 hours this customer had to wait. Of course, the customer's capacity and I/O load will almost certainly increase over time, and as that happens, the strategy of splitting the load between two arrays loses its effectiveness.
Along with adding the controllers, they are also certainly adding disk drives, and some notion of what "reasonable" utilization limits should be for them. The problem with limiting utilization as a best practice is that it puts the stamp of approval on inefficiency - not only for capacity utilization abut also for the power and cooling required to support all those underutilized drives. Most legacy arrays have built-in inefficiencies in the way data is laid out on disks, making it virtually impossible to achieve uniform utilization across all disk resources. The result is uneven consumption of disk capacity, as well as uneven I/O service levels among different disk groups, which is another variable in how much performance degrades following a controller failure in a dual controller array.
Finally, the customer now has two arrays to manage, including multipath connections, SAN zones, and all other aspects of the configuration, which all contribute down the road to change management complexities. The result is a net drag on administrator effort and an increased TCO.
How many do you need?
A true best of breed solution would address the root-cause deficiency in the array's design, without creating additional management and cost burdens to the customer. Obviously, more than two controllers are needed. But how many controllers does a cloud service provider need in an array? The answer is at least three. Why? Because when a single controller fails, there can still be two surviving controllers working together, mirroring their cache contents, and performing fast writes to cache memory. That said, controllers are usually packaged in pairs for redundancy purposes, which means that the most likely configurations will have four controllers.
If you compare a single quad controller array with two dual controller arrays there are some key advantages that immediately jump out:
No or limited loss of performance after a controller failure
All drives and cache can be used to service all workloads
Managing a single array significantly reduces cost and complexity
A better recipe for maintaining performance levels
The next question is; "Is there a suitable quad controller array that the customer could have used instead of the two dual controller arrays they have?" Yes, 3PAR's F400 or T400 arrays are both quad controller arrays. The disk drives in these arrays can be either SATA or FC, or a mix of both types if the customer wanted to implement tiering. Product information of the F400 can be found here, and the T400 here.
However, simply putting four controllers in an array does not necessarily guarantee that they will be able to sustain write caching if one of them fails. The array must have the ability to remap and re-mirror the write cache contents of all four controllers to the surviving controllers following the loss of a controller. It's an interesting geometric sort of problem: There are four controllers, each with their own cache and cache that is mirrored from the other controllers in the array. All cache contents, including mirrors, need to be distributed evenly across all controllers to avoid congestion and load imbalances. All cache content, including mirrors needs to be accounted for within the array so that if a controller fails, the other controllers will be able to identify all the surviving original and mirrored copies of data. For cache data that has lost either a primary or mirrored copy, a second (new) copy needs to be made. Finally, the amount of data in cache may need to be re-leveled (decreased) to fit into the degraded cache capacity (3 controllers instead of 4).
The software for doing this in a 3PAR array is Persistent Cache. Product information on Persistent Cache is here (PDF)
I made a 9 minute last year video describing how Persistent Cache works. Here it is again. Thanks for watching.
(A quote from Dieter Rams - former Chief of Design at Braun)
It's hard to think of a company that has had more success with it's product designs than Apple. When you look into how Apple did it, you find out about Jonathan Ive - Apple's lead industrial designer - and how his designs have followed the philosophy outlined by Dieter Rams, who was the lead designer for many years at Braun. When you compare photos of their designs, it is obvious that Ive has a strong appreciation for Rams' work.
What Ive and others have found compelling in Rams' work is nicely summarized in the design principles Rams used at Braun for many years.
Good design is innovative
Good design makes a product useful
Good design is aesthetic
Good design makes a product understandable
Good design is unobtrusive
Good design is honest
Good design is long-lasting
Good design is thorough down to the last detail
Good design is environmentally friendly
Good design is as little design as possible
The design goals for consumer products differ considerably from those for industrial products. For example, aesthetics and innovation tend to be less important than reliability and ROI - two characteristics that didn't even make it onto Rams' list of design principles. But there are also principles that certainly belong to both, such as making a product useful and unobtrusive. So, what should the 10 design principles be for information infrastructures products? Here's my list:
Good design makes a product useful (it solves customer problems)
Good design has recognized limitations
Good design is unobtrusive (needs minimal management)
Good design has an attainable ROI
Good design makes efficient use of resources
Good design is scalable (capacity, performance & management)
Good design is resilient (sustains performance through sub-optimal conditions)
Good design makes a product understandable (facilitates planning & changes)
Good design is long-lasting (and accommodates future innovations)
Good design is environmentally friendly
Producing this list was much more interesting than I thought it would be. For starters, it took me some time to get settled in a customer's perspective - as opposed to my usual vendor employee perspective. (l have this wonderful hammer you need). Also, to clarify a point, the idea of management scalability involves the number of people who can effectively manage and control a system simultaneously. That might not be a concern for smaller IT systems, but it certainly is for large-scale systems.
What would you change? Would you reduce or expand this list?
There have been some excellent discussions recently in the storage blogosphere and on Twitter about the concept of Storage Federation among a number of storage people; known by their Twitter IDs as @stuiesav, @storageanarchy, @rootwyrm, @davegraham, @bwhyte, @ianhf, @esignoretti, me (@3parfarley) and others - as the interest continues to increase.
There are two aspects of the discussion that I think are fascinating: first is the role of social media as the means to include customers, vendors and others in an open discussion that typically is conducted privately by a vendor preparing to release a new product or feature, second is the challenge of defining a storage capability with sufficient focus and vendor independence so that is is meaningful. There has been some amount of skepticism about this effort, suggesting that we are predetermined to end up with ambiguous terms that can be interpreted (spun) by anybody (any vendor) to mean anything (our product does it). I'm hopeful the results can be better than that, but subjectivity is the only constant of social media and it's not likely that everyone will like whatever happens.
In general, the language of digital storage has many overlapping meanings, which causes a fair amount of confusion in our industry. I've been dealing with this problem for many years, going back to when I wrote Building Storage Networks and had the challenge of trying to invent generic terms for functions that had been coined by vendors and tied to specific products. My interest in defining Storage Federation goes beyond my role as an employee for 3PAR.
The notion of federated storage has been around for several years, but recently it came to light when Pat Gelsinger of EMC referred to it during a press briefing on March 11. EMC blogger Chuck Hollis wrote about it afterward and there was some chatter, culminating in a blog post by @Stuiesav on April 2, which proposed that the discussion about Storage Federation was just marketing hype attempting to rebrand storage virtualization. EMC's Barry Burke (@Storageanarchy) and I (@3parfarley) both agreed that this was not the case this time around and then in the last couple days the discussion fired up again. The problem with Twitter is the limitations of 140 characters per tweet. It's surprising what can be done with so few characters, but it does have limitations.
Skeptics Allowed
This weekend, @ianhf posted in his Grumpy Storage blog echoed @Stuiesav's skepticism and expressed his perspective (as a customer) as to the things he would like to see when new technologies - in this case Storage Federation - are introduced. Here are some of the items from his list (Some of them don't fit our definition exercise because they assume there is a new product being introduced, which is not the case here).
The specific customer requirements & problems this addresses & justify how
The use cases this feature / function applies to, and those that it doesn't
Why & how this feature is different to that own vendor's previous method for solving this problem
Provide clarity over the non-functional impacts of the feature
before, during & after it's use - ie impact on resilience, impact
on performance, concurrency of usage etc (including provide up-front
details of constraints)
Naturally you'll also expect me to require TCO & ROI of the
feature, and any changes to the models as a result of this feature
So lets get started (already!):
The definition of Storage Federation that was kicked around on Twitter is something like: "the transparent, dynamic and non-disruptive distribution of storage resources across self-governing, discrete, peer storage systems." (And yes, I did elaborate a bit on this while I was writing based on bits and pieces from comments I read and further thought of my own.)
The idea is to have multiple storage systems cooperating as a team (as opposed to under the direction of an external entity) to place data in the aggregated storage resources (LUNs or volumes) of all participating members. An example of Storage Federation is how Dell/EqualLogic arrays distribute their volumes over multiple systems. When a new EqualLogic array is added to an iSCSI SAN, the administrator is asked if the array should be placed in the same group as other arrays in the SAN. If this is done, the arrays start splitting their volumes (and workloads) across both arrays.
Five examples of storage federation capabilities are:
Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
These are all examples of Storage Federation that Dell/EqualLogic storage systems are capable of today. It's not intended to be an endorsement of their distributed volume manager, I am simply using it as an example of Storage Federation. Saying that, if you look for "Storage Federation" on the Dell/EqualLogic web site, you probably won't find it today because they don't describe it that way, but that doesn't mean that it is not in the product.
That's not it
People also want help understanding what the definition of Storage Federation should not include. Here are my thoughts:
In-band storage virtualization systems like IBM's SVC can provide all the capabilities listed above for Storage Federation. However, in-band virtualization products govern the behavior of the other storage systems networked to it and Storage Federation involves self-governing, peer systems. Another way of saying it is that Storage Federation does not have functionality in the network between host systems and storage.
File, object and data distribution technologies like EMC's Atmos certainly provide a type of federation insofar as multiple computer systems can access the same data objects from multiple locations that may be separated geographically. However, this capability primarily migrates files (data objects) - and is functionally orthogonal to Storage Federation, which works on volumes. A term like Data Federation is probably more precise than Storage Federation for this sort of capability.
Clustered storage systems, like 3PAR's InServ storage systems or clustered NAS systems are not examples of Storage Federation because they function as single, scalable storage systems, not as discrete storage systems that are networked together. Clustered storage systems can work together in Storage Federations with other clustered or non-clustered storage systems. In that case you could have a Storage Federation of clustered storage systems.
How is this different?
I thought I'd make a little effort to respond to @ianhf's requests.
The main problem Storage Federation addresses the limitations presented by a single storage system, such as capacity, performance and maintenance availability.
Currently, when a new storage system is added to an existing environment, there are server administration tasks that need to be done in order to migrate volumes from an existing storage system to the new one. In addition, there is usually some amount of downtime and/or performance degradation associated with the migration and configuration of new data paths. By contrast, with Storage Federation, a new storage system could be added without having to reconfigure host data paths and the migration would be processed transparently and dynamically, without any downtime or loss of path redundancy while the migration is in process.
Increasing performance for a particular volume in a Storage Federation is somewhat less obvious because the other federated storage systems might not have more performance-resources available to boost performance. For instance, if a new storage system inserted in a Storage Federation does not have more disk drives (spindles) or more flash SSD capacity, there is no guarantee that any volumes moved to it would provide faster performance. There is some likelihood that performance could be improved by moving a volume to a storage system running at lower utilization levels, but maintaining lower utilization levels is not realistic or very cost-effective. All this said, it is possible for a Storage Federation to use the aggregate resources of multiple storage systems to increase performance of a single volume - such as by spanning all the disk drives in the federation and by using the cache of all participating storage systems. An example of products that provides this capability today are the Dell/EqualLogic storage arrays.
The availability of storage systems needing maintenance could be greatly improved by Storage Federation if an existing storage system can be removed from the federation after having it's volumes evacuated (think v-Motion but for storage systems) to other storage systems in the federation. After being removed from the federation, the storage system could be downed to have any sort of maintenance operation done to it without risking the availability of any of the volumes that had previously been on the system undergoing maintenance.Obviously, all of this would take time planning and ensuring the Storage Federation would not be overloaded with data and workloads, but there are probably many customers that would prefer to do maintenance to storage systems when they are offline.
One of the possible exposures of Storage Federation is the increased exposure to system failures. For instance, a Storage Federation that distributes a single volume over three separate storage systems is 3 times more likely to have a failure than a Storage federation that does not allow volumes to span across storage system boundaries. FWIW, this is the main weakness of the Dell/EqualLogic implementation of Storage Federation.
That's all for now
It's late and there is plenty of material here for the grinder. Please feel free to comment in any way; agree, disagree, correct mistakes and ask questions. Thanks for reading.
There's been a dysfunctional discussion of capacity guarantee programs over on Chuck's blog. There had been more sensible, independent discussions on the Storage Architect's blog, but that apparently wasn't good enough for EMC - a company without a capacity guarantee program of their own. Unfortunately, Chuck decided to shut down comments on his post, citing an overload of vendor hash - which could continue to go on as long as there is breath left in any bloggers from Netapp.
Chuck's post poses the question - do you want to buy from a doctor or a used car salesman. The suggestion he makes is that EMC treats you like a doctor while 3PAR, HDS and Netapp treat you like used car salesmen.
The doctor picture he used was this one:
Which reminded me of Scrubs - but of course there are other doctor images he could have used:
I'd suggest Chuck is using classic used car sales tactics: "Who loves ya baby? The warranty them guys offer don't protect you from nuthin'. Your engine will blow up the day after the warranty expires. All they want is your munny!"
Still, seeing as how he was linking this image to 3PAR (in one way or another), I'd have hoped he would have used a picture like this instead:
You might not end up buying that car, but you should at least check it out.
Chuck characterizes capacity guarantee programs as not being in the customer's best interests. That would be true if 3PAR, HDS and Netapp wanted to increase the number of unhappy customers they have, but that is just CRAZY EMC thought diarrheaship:
Instead, I'm pretty sure we all want our customers to be very happy with their storage solution:
Yes, 3PAR's capacity guarantee is a way to attract customers, but it's much more than that - it's a way to back up our efficiency claims by putting our money where our mouths are:
RecoveryMonkey had a post recently about FUD and the ridiculous corner case claims storage vendors sometimes make about each other. 3PAR has been telling customers for years that our products are more efficient than theirs and we are now backing it up with our capacity guarantee. It's not FUD, it's not spin and its definitely not a corner case.
The rumor is that Copan is shuttindg down. If so, I suspect MAID storage will quickly become an afterthought now, except for a small number of customers and applications that will keep the technology on life support. The problem with MAID is that there aren't enough applications for selectively-spinning disks. Selectively spinning disk drives are more expensive than tape for archiving and are more problematic than standard disk systems for backup. That leaves applications such as video on demand, which is not a large enough market to float a serious startup these days. Thin provisioning for primary storage and dedupe for backup have become the technologies of choice for customers looking to increase the efficiency of storage.
The outcome of Copan's failure means that tape will continue to be the most prevalent technology for archiving. In turn, that means archiving software will continue to be tied to tape for some time, including all the problems inherent in maintaining metadata for it. If you are in the legal industry and hoping to see leapfrog improvements in digital discovery processes, don't hold your breath. Discovery from data on tape will continue to be a laborious task. SSDs for archiving would appear to be a slam-dunk at some distant time in the future, but the cost has to come down a long, long way first.