A screencast discussing the bid EMC is making for Isilon. Covers the topics EMC highlighted in their announcement, including "big data", Atmos integration and the EMC effect. Contrasts the differences between EMC's divergent NAS strategies and HP's Converged Infrastructure NAS strategy with the X9000 platform.
I was somewhat familiar with Ibrix as a software product that powered NAS clusters, but the new ESG Labs report helped me grasp HP's vision for the X9000 storage appliances much better.
Interested readers should view the report to see the results as well as the methodology that was used. There were three test beds covering throughput, content delivery and file creation metrics culled from a mix of X9000 configurations. The X9320 is a storage appliance with internal disks and the X9300 is a gateway version of the product that connects to external SAN storage. Another model, the 9720, which is the super-sized version of the 9320 (full 42u rack) not used in the tests.
3PAR customers will be familiar with the processing architecture of the X9000. The granular "head unit" of the X9000 system is called a couplet, and is a pair of fault-tolerant NAS heads. This is similar to 3PAR's storage system architecture where nodes are added in pairs.
But the surprising thing about scalability for the x9000 is not necessarily how large it can grow, but how effectively it can also be employed in much smaller environments. As the ESG Labs report concludes:
Who would have guessed that companies overwhelmed by Word and PowerPoint archives could benefit from the same solution as those burdened by 100-TB annual growth of genome sequencing data? Who knew that a NAS file system developed for high-performance computing could evolve into a graceful, cost-effective scale-out solution with predictable and near-linear performance for small and large files and exotic and everyday applications? The challenges that scale-out NAS solves are much more “everyday” than “lunatic fringe,” and the X9000 makes it consumable by almost anyone. If you are facing file system growth and complexity challenges, you should consider the X9000. It’s affordable, includes commercial features like snapshots and replication, and lets NFS and CIFS work on the same file system. You can buy a scale-out architecture that will grow with you and meet the needs of your business without interruption. The Fusion segmented file system, combined with HP’s servers and storage (not to mention HP’s buying power and supply-chain advantage), brings what started as a niche solution to the masses.
Cloud infrastructures need to be efficient if they are going to compete. Money saved on operations is an annuity to cloud service providers that goes to the bottom line every day.
Terremark is a leading cloud service provider that delivers just in time infrastructure services, as described in this excellent post on the Boxed Ice blog today titled, The New Server Density Infrastructure. This post, written by a Terremark customer shows that Terremark is not only interested in driving down their own costs, but also in helping their customers be more efficient too.
If you are interested in a behind the scenes look at how Teerremark does it, you owe it to yourself to check out this case study on Wikibon today. This chart from the Wikibon study shows how Terremark derives its operational savings (click on it to see an enlarged version)
Savings from storage were projected to be 86%.
Guess who their storage vendor is? (The report says what equipment they are using). But one big clue is that it isn't EMC, as EMC's Chuck Hollis suggested back in July.
Cloud industry insiders know this stuff. We don't have all the cloud customers, but we have some of the most largest and most influential ones. Nobody really knows what changes cloud computing will bring to our industry, but 3PAR is very well prepared to be a key component of those infrastructures.
To a lot of people, especially those who are unfamiliar with the storage industry, one of the obvious questions is "Who are these people and where did they come from?"
The answer is that the company was formed by a group of server-cluster engineers from Sun and has been around for over a decade developing and selling large scale storage products designed for something that used to be called "utility computing" seven years ago, but today is just called "the cloud".
We've been very successful with our cloud strategy and have 7 of the top 10 IAAS (infrastructure-as-a-service) customers as clients. 3PAR products work very hard in the background for a lot of household-name customers. Most people don't know or care.
However, cloud industry vendors know 3PAR because they are also very heavily involved with those same customers, competing with their own products. They see our storage systems in those large data centers and our customers tell them that they need to make sure they work with us. There's nothing unusual about that sort of thing, but we definitely are a player.
Here's what we do very well:
Handle mixed workloads in multi-tenant environments. That means we can service a wide range of applications concurrently from multiple servers (or virtual servers) concurrently. If you are a cloud service provider and you need to have the confidence that your infrastructure will deliver consistently - even if your business changes radically - 3PAR is a very safe choice.
Operate efficiently at high utilizations. Most storage in the world today is underutilized, which gets under people's skin because there is a high cost associated with that. 3PAR has led the industry with thin storage technologies that started with our implementation of thin provisioning many years ago and has been recently enhanced through thin reclamation, thin conversion and thin persistence software features. Again cloud service providers need to keep the cost of their operations under tight control.
Give administrators the highest levels of automation. Administrative work on storage has historically been difficult and time consuming, which has been problematic for cloud service providers who have to be able to increase and reconfigure resources to meet fast-changing business demands. 3PAR storage lets system administrators make adjustments more quickly and confidently.
The thing that we didn't completely understand at 3PAR was how quickly the onset of the virtualized data center was going to tilt the storage world in our direction. 3PAR storage systems are based on a highly advanced, granular storage architecture. It's not always the easiest thing for people to understand because it is so different than any other vendor's architecture. However, people familiar with virtualized server features have a much easier time understanding how our technology works. There is nothing like a terrific, relevant analogy for explaining how your different widget works.
3PAR is a relatively small company, competing with much larger companies who use the benefits of their size, global reach and service organizations against us every day in sales opportunities. It hasn't been easy, but we've continued to grow our business in a very hotly contested arena where our competitors like to position us as the "small, new company" Storage purchases in this market are high stakes and careers can be made or lost on the right decision. We certainly don't win all the deals we are in, but we very seldom lose on technical merit. Usually it's because we are lesser known or because we can't match the service offerings of our larger competitors.
It appears that some of those variables will be changing for us relatively soon.
Yesterday I posted a demo of our new, updated InForm Management Console 4.1 and so I thought today I'd re-post a two-part video showing our VMware vCenter plug-in that was made by 3PAR architect Maneesh Jain. Make sure to pay attention to the Recovery Manager section of the demo that shows how easy it is to recover VMs, directories and files.
Virtualized storage from 3PAR flexibly adapts to mid-range up through enterprise VMware environments because our single software architecture runs the same code on both platforms. The skills used to manage one platform are preserved when switching to the other.
3PAR designs its systems to provide huge time savings for storage administrators. Below is a video of our new InForm Management Console (IMC) 4.1, announced today, showing how incredibly easy it is to configure and operate 3PAR's Remote Copy application.
Things that the demo didn't show that are advantages of 3PAR's single software architecture are:
A single console can manage both Mid-Range and Enterprise arrays
A single console can manage both local and remote systems
A single console can manage all array software elements
Customers can mix and match systems for replication- for instance they can use enterprise T-Class storage at their primary site and mid-range F-Class arrays at the secondary site. That arrays used primarily to comply with regulations can be much less expensive.
Replicated volumes have to be the same size (capacity) as the primary volume they are protecting, but they can be any class of service (disk type combined with RAID type). This means replicated capacity can be optimized for capacity efficiency while the primary storage can be optimized for performance. Again, this results in operating savings compared to competitive arrays that require replicated volumes to be the same configuration.
Here is a brief description of all the software functions available through IMC 4.1. As you can see, it's a pretty comprehensive list of features:
System Manager: used for viewing performance and utilization stats and configuring hardware elements.
Host Manager: used for configuring hosts that access 3PAR arrays - including Autonomic Groups, which are used to configure groups of servers to storage simultaneously and Virtual Domains, which restrict access between servers and storage resources according to group membership.
Provisioning Manager: used for provisioning both thin and thick volumes as well as setting up Virtual Copy (snapshot) volumes for them. Changing the class of service for volumes by re-striping them over additional drives or different types of drives and/or RAID levels through Dynamic Optimization is also done through the Provisioning Manager.
Event Manager: used to display logs,events and alerts.
Hardware Inventory Manager: does what it sounds like - reports on what is in your array.
It's getting very hard to keep up with all the crazy social media stunts coming out of Hopkington, but they seem to have done it to themselves again. First was the questionable spamming for viewers so they could claim they had a viral video, then today they just "leaked" a 3PAR sales "kill sheet" - and also apparently established a "secret" site with the URL Notapp.com, where they compared their own guarantee program to Netapp's. According to Simon Sharwood at Search Storage Australia, the site was removed and accessing the URL directed browsers to EMC's site.
Perhaps it is all part of a new marketing strategy by newcomer Jeremy Burton, who joined EMC as Chief Marketing Officer back in March. As best I can tell, Burton's new marketing strategy for the company is that people will believe anything. Maybe he doesn't think there are enough new products coming out of EMC - or that the delays in getting their ballyhooed FAST out the door are too embarrassing - but instead of trying to promote EMC on its own merits, it looks like he is doing his utmost to mud wrestle. Is that what EMC is paying him the big bucks for?
EMC suddenly is taking a bigger interest in 3PAR. That's good. Search Storage Australia just published parts of a competitive document that EMC was circulating to it's partners about 3PAR. It certainly wasn't a surprise because we'd seen it previously, but I was sorry to see it published because it made EMC look ridiculous, which was working pretty well for us. But now that it's been outed, here is what we have to say about it (in the guise of Ineption's lead character, the CRO)
The messaging is not built in, but our zero detection technology for optimizing capacity is. The host SW commands to do this are short and do not require "careful coordination". Veritas, Oracle, Windows Server and Linux software all work with minimal operator effort. For instance, this document from Oracle, describes the whole process, with the sole operator command being this: #bash ASRU LDATA.
Can EMC provide online reclamation of zeroed space without risking capacity overruns and with tolerable performance? 3PAR can. Does EMC have these capabilities in both mid-range and enterprise storage arrays? 3PAR does.
3PAR has both Flash and 1 TB SATA drives. We also have Adaptive Optimization software that uses Flash SSDs for storage tiering. EMC still doesn't have it after they made such a big deal about it last year. They like to tell customers that their size gives them development advantages, but their track record doesn't support their claim.
3PAR arrays allow users to create many tiers, but without the need for disk pools. Tiers are constructed from the combination of drive type plus RAID level. For instance, you can have separate tiers for SATA, FC and Flash SSD drives with the RAID level you select. Our Dynamic Optimization software allows admins to move data from one tier to another. You can "dial in" the performance and protection you want.
All systems have a peak output , ours just happens to have a lot more throughput than theirs - and at higher disk utilizations. We have published benchmarks that show how our systems perform. They don't. Adding disk drives to a system and utilizing those drives is far easier with a 3PAR system than either VMAX or Clariion where you have to wrestle with putting drives in the pools you want to use them for.
There are no disk pools in 3PAR storage. Pools trap resources so you can't use them. Work isolation in pools leads to hot spots and storage admin nightmares. Wide striping does not mean you can't have tiers. That is an idiotic statement.
VMAX can configure large pools - and all the drives in them have to be at the same RAID level meaning you can't create multiple tiers within those pools. If you want multiple tiers, you need multiple pools and all the headaches that involves. Change management in an environment with multiple pools is complicated. You also need to consider the pools needed for snapshots and remote replication. Are those easy to provision and change on EMC storage. Most would say "no".
3PAR uses all disk spindles all the time for delivering IOPS and pro-active sparing is done using reserved space on those drives. Rebuilds do process quickly. Would EMC have you believe they never have to perform drive rebuilds? Really?
The RAID6 thing really makes me sad. They look so stupid when they say it. We're all sorry to say goodbye to that piece of FUD.
Our front end archiecture was designed for large-scale parallel connectivity to match the massive bandwidth capabilities of our wide striped back end. Our benchmarks and the cost per IOPs in those benchmarks speak for themselves. Our customers also tend to run 3PAR systems at much higher disk utilizations than they run other vendor's arrays.
We support a huge number of ports on our systems w/full
active/active data access across all controllers. All controller nodes can be used to access all data volumes. We have a number of customers that run fairly sizable SANS without switches because they have enough ports on their arrays so they don't need to consolidate access through switches.
5- 9s? We're there. Our systems get pounded on every day in some of the largest private and public data centers in the world. They are designed with complete redundancy in all components and have advanced capabilities such as Persistent Cache to maintain high levels of performance even after the loss of a controller.
The delays in bringing their FAST tiering software - a product they were hyping in April of 2009 - to market have shown that size doesn't matter much when it comes to delivering technology on time. I'm not saying 3PAR always delivers on time, but EMC is far from immune to these problems. In fact, the need for them to coordinate across multiple product lines creates certain disadvantages for them.
As to their comments on our support; they are pure FUD and grasping for straws. We would not be able to maintain the customers we have if it were not for our efforts at supporting them.
* * * * * *
The following content was added on July 30th by Rusty Walther, 3PAR's Vice President of Customer Services & Support.
Stating
that 3PAR “outsources support” is just plain silly, especially coming from a
company that keeps most of the worlds’ largest offshore outsourcing companies
in business. Like EMC, 3PAR uses Third Party Maintenance suppliers
(TPM’s) for break-fix field activities. In some geographies, EMC and 3PAR
even use the “same” TPM. But EMC also outsources most of their volume call center and Level-1 Technical Support to offshore suppliers. Not so at
3PAR. Everyone that touches a 3PAR support case is a 3PAR-badged
employee. I challenge EMC to identify a single outsourcing company that
handles 3PAR technical support. EMC’s outsourced technical support
sub-contractors could be listed alphabetically, by geography, or by technology category
… but you’d need a couple of sheets of paper to do it.
The twitterverse is busy again today with discussions surrounding EMC's us of spambots to generate views of videos they are trying to make viral. If you are interested in seeing what is being said, check out these people's tweets and you'll be off on a trip down a dark hole.
Here are a couple cartoons I made about it last week from my new cartoon, Ineption:
Netapp's Val Bercovici suggest this viral spamming as the end of innocence in social media, but innocence exited the social media stage long ago.
I'm much more concerned about how large companies like EMC can use social media to suggest product and customer relationships that stretch the truth well beyond the impressions that a reader might take away from reading suggestive blog posts from respected corporate voices. As "unofficial company statements" that are more influential than press releases, social media pieces can distort things in a way that more-accountable corporate marketing are not allowed to.
Last week, Chad Sakac and Chuck Hollis published blog posts that pointed to an EMC white paper about details of a VMAX implementation at Terremark, an excellent 3PAR customer. Readers of these posts would probably think that VMAX was being used as the storage behind Terremark's multi-tenant, Enterprise Cloud service offering. That would be stretching things more than just a little bit. I commented on both blogs and the responses to my comments were interesting. I guess I feel a little kinder towards Chad as a result.
It is possible that somewhere in the world, a VMAX is being used by Terremark. One would expect Terremark to be looking at various storage platforms as a matter of course, it only makes sense for them. After all, VMware made a significant investment in Terremark last year and we all know who owns VMware. There are certain favors that EMC can ask that vendors such as 3PAR can't. But Terremark also has to operate Enterprise Cloud in their US major data centers every day and the storage they use for that is not in a test lab - it's production - and it is 3PAR storage.
And its not for lack of trying on EMC's part. Last November when VCE was announced, Terremark was discussed as a featured customer in both Chad's and Chuck's blogs. That was OK, I understand the excitement that surrounds a big announcement. But nine months later, to suggest that this announcement had given birth to a major production environment for a service that it is not supporting sort of stuck in my craw.
I wrote about the fact that we already had zero detect technology in our product, which is useful for the new Full Copy command because it allows customers to remove zeroed data from clones when they are created and return them to array free space.
The discussion became a bit confused when Chad interpreted what I was saying as pertaining to Block Zeroing.
Block Zeroing and Full Copy are different aspect of the VAAI API. The intent of block zeroing is to reduce the amount of CPU effort and storage traffic required to write zeroes across an entire EagerZeroThick (EZT) VMDK when it is created. The intent of Full Copy is to make clones of VMs quickly without consuming I/O bandwidth. Things get interesting when you start thinking about making a full copy of an EZT VMDK that was created using VAAI with block zeroing - but I'll discuss that later.
I also want to clarify what zero detection technology is. 3PAR T and F class arrays have zero detection technology, which is enabled by Thin Persistence software, that recognizes zeroed blocks as they are read by the array and returns them to the array's free pool. Any read requests made to these block addresses will return a zero value. In essence it is dedupe for zeroes.
However, Zero detection is not needed when an EZT VMDK is created using the VAAI plug-in because the array will recognize the intent of the command and not write the zeroes. In other words, the VMDK will only contain a very small amount of reserved space when it is created. Again, any attempts to read blocks in those ranges will return zero values. Zero detection is effectively bypassed during the creation of the EZT VMDK.
The exception to this behavior is when the EZT VMDK being created is written to a thick volume - in that case the array will write zeroes across the entire VMDK.
The remaining cases for the creation of EZT VMDKs on 3PAR arrays occur when the VAAI is not used. For a thick volume, the entire VMDK has zeroes written to it. Thin volumes not using zero detect also have zeroes written over the entire VMDK. Thin volumes with zero detect will not have zeroes written to them and will contain only a small amount of reserved space.
FWIW, the reserved space is used as instantly-available capacity that can be allocated on-demand when writes start coming into the volume. 3PAR arrays always "read ahead" free space to improve the performance of thin provisioning.
The next bit here could be a bit thorny, so clear your head. The matter of making a Full Copy of an EZT VMDK to a thinly provisioned volume was something Chad said was not allowed. My assumption here is that the type of thin provisioning used makes a big difference.
For instance, if you are using TP from VMware, I could see where they would not allow a full copy to be made. The problem is that the full copy will return all the zero values for the source VMDK, whether or not those zeroes were ever actually written - and write them to the target TP volume. In other words, the target could be much larger than the source. In the VMware TP scheme, this could make for problems in a hurry if you were making a bunch of clones this way.
In contrast, if you were using a 3PAR array with zero detection, the Full Copy of the source VMDK would return zeroes for the entire VMDK, but the zero detection would strip them out again as the target was being written. You could make as many clones as you wanted this way, knowing that the physical capacity they consume would be a multiple of the physical capacity consumed by the source VMDK. In other words, you wouldn't have to worry about virtual zero bloat making a mess of your VMFS volume.
One of the big differences between 3PAR's zero detection technology and other vendors zero-reclaim technology is that 3PAR's process is real-time-on-ingestion as data comes into the array, whereas zero-reclaim works in a post processing fashion after the zeroes have already consumed disk space. This could be a significant difference in many cases because the post-processing method has the potential to create unexpected capacity-full conditions before the zero-reclamation process even has a chance to start.
Steve Taylor, one of our SEs, created an animation that shows the multiple layers of virtualization that create the natively wide-striped data layout on a 3PAR storage server. I think it's the coolest thing I'd seen since joining the company that quickly summarizes the multiple layers of virtualization in a 3PAR array.
All the functions shown are automatically done for the customer with minimal administrative effort. 3PAR customers do not spend time planning the layout of special disk pools or preparing their disk drives configurations for certain functions. All they do is select the drive class and the RAID level for the volume they are creating and the rest of the data layout work is done for them.
The demo shows how a RAID 5 3+1 virtual volume is created, what it does not show is the way other volumes would be created using different RAID levels over the same set of resources. It would be a replay of this, but with a different RAID level applied - everything else would be the same.
Not only does this design provide massive throughput, it also responds very quickly when customers need to add volumes. It's like driving a freight train that can corner. Try doing that with your v-Max on anything but a test track.
OK, Monday's post lacked the punch that people have come to expect from me where EMC announcements are concerned. Thanks to my readers who were disappointed and told me so. This post is for you.
VPLEX (announced Monday with all the hype that EMC could muster) is the result of EMC trying to become something more than a storage company. They put a systems professional in charge of developing storage products and produced a science project that looks very attractive as a technology but relegates storage to a secondary role and questions EMC's commitment to solving storage problems. In fact, VPLEX probably creates more storage problems than it solves. More on that later.
The concept of teleportation for virtual machines resonates with childhood space fantasies. I remember summer afternoons from my childhood going out to a nearby field with my brother and his model rocket club. My brother built some stunningly cool looking model rockets, and the amazing thing was that they all went where he intended them to - up. Just as amazing was the fact that the engine always ejected the nose cone and parachutes to bring his creations back to earth gently. It was a riot running across that field trying to catch them in our bare hands as they descended. A few years later those rocket designs turned into a set of World Book Encyclopedias that he won at a big science fair in Michigan. I was prouder of him than any kid brother could be, even though I was just the rocket runner.
However, his rocket club also had a couple of less-talented rocketeers who also launched their contraptions with great fanfare and, quite often, spectacular disappointments. Engines became separated from the rockets holding them, shooting around at random on the ground - and then there was the memorable starship design that took a 90% turn after exiting the launch wire and proceeded to fly parallel to the ground at height of 4 to 5 feet, sending parents, siblings and friends scrambling for cover behind a shrub, pet or each other.
It's good to believe in some things and to be wary of others.
So what is VPLEX anyway?
For starters, VPLEX is a virtualization appliance - a bump-in-the-wire product. EMC says it will take the technology at some point in the future and build it into their arrays, but like a lot of things, its much easier to talk about it than to actually do it.
The irony of the situation is that it VPLEX fills a storage virtualization role that is similar to IBM's SVC - a product that EMC bloggers eagerly lampoon. EMC is downplaying the similarity to SVC and are spinning VPLEX as a teleportation device for your virtual machines. Of course, the image of teleportation they want you to think of comes from Star Trek, but sci-fi fans are familiar with the risks of teleportation too - just for fun, recall the problems Jeff Goldblum had with it in The Fly.
The point is, VPLEX is an appliance and therefore it is another thing to manage and pay for. In fact it is at least 4 more things to manage (two here and two there) and can be as much as 16 additional things to manage. Just in case that point was not clear, VPLEX is new stuff to manage in between your servers and your storage. Like the Wizard of Oz, EMC would like customers to "Pay no attention to those things in front of your storage, - they are virtual and therefore invisible." In fact, it's so invisible that EMC claims you can use it with any vendor's storage, which means there is minimal integration with storage functions. EMC is calling VPLEX Storage Federation, and yet it doesn't integrate with storage on any meaningful level. That's not federation, it's virtualization.
Suddenly, the virtualization of all your storage through an appliance is a trivial matter. But of course it adds considerable complexity to an environment that is already complex enough. To put a finer point on it, EMC customers are familiar with the down and dirty disappointing reality of working with Virtual Provisioning, SRDF and change management. They should expect to pay for additional professional services to figure out how to do these things with VPLEX inserted.
And then there is the matter of managing mixed workloads in virtual environments, something that is a problem for a lot of customers. EMC's technique of isolating workloads on certain disk and cache resources doesn't work for highly leveraged VMFS's where there are many VMs running concurrently. When the workloads are all mixed together, there is no way to isolate them. VPLEX appears to be an answer to that problem by allowing
customers to move VMs elsewhere - presumably where there are more
resources available. In reality, those VMs are going to be moved to
another environment with the same storage limitations as where they
came from. Do you really think putting a distributed write cache
between servers and storage is going to solve workload balancing
problems? If so, the fix will only be temporary before the same
problems are repeated. And that is the problem with VPLEX - it doesn't
solve anything by itself, it only allows a problem to be moved
elsewhere - presumably someplace where there is more headroom that is not being as heavily utilized.
Storage systems that were designed for utility computing and assuming there would be mixed workloads to deal with, such as 3PARs InServ storage systems, are much more effective at maintaining performance levels for virtual machines and allow customers to operate at much higher system and storage consolidation ratios.
EMC wants you to imagine all the things you might be able to do with VM teleportation. I'd say, imagine how much fun its going to be doing the capacity planning for a VPLEX installation. How much storage do you need in location A and location B to support two way VPLEX functionality for a collection of 80 VMs? As it is with most things, it depends on a number of variables including whether or not VPLEX is being used to facilitate disaster recovery. It would not surprise me if VPLEX ends up effectively capping storage utilization at 50% for customers that employ it. If the maximum utilization is 50%, what will your actual utilization levels be? Something far less than that.
Reducing consolidation ratios
In a nutshell, all systems that are fronted by VPLEX will have the same write cache data so that a read from those systems will be serviced from the write cache, if the data has been recently written. However, if a read request is made from a VM for data that is not in VPLEX's write cache, the data will be read from either a local or remote storage array. That's why there are distance limitations to VPLEX today - the time it takes to cross-ship data from a remote location can't be too obnoxious.
It's also why EMC's Virtual Geek spent time talking about using applications that have a single writer - such as v-Motion. A VM with an application that attempts to write to the same block from two different locations is apparently problematic. My guess is that applications with long sequential reads might not be so great either.
That implies a couple things: You should expect to find out there are limitations as to what applications can be running with VPLEX. Maybe identifying VPLEX friendly applications will be easy, but perhaps this task will be harder than assumed. If so, the best answer may be to engage EMC's pricey professional services organization. The details, the details.
The other implication is that VPLEX will limit the number of VMs that can be running on a single vSphere server. After all, you wouldn't want to have any congestion problems forwarding write cache data through the VPLEX network. The higher your server consolidation ratio is, the more likely it is that you could run into bottlenecks of varying types. Expect to see best practices for VPLEX that scale back your server consolidation ratios.
Just because you can do something, doesn't mean it's a smart thing to do.
A couple weeks ago, one of the major storage vendors had two major problems to resolve after one of their arrays suffered a firmware bug-induced failure at one of their cloud (email) service provider customers. They had to:
Help the customer get back to normal service levels after they had become unacceptable.
Confront a public relations problem after it was exposed by a leading storage publisher.
Meanwhile, their service provider customer had four major problems to resolve:
Get service levels back to acceptable levels.
Communicate to their customers what the problem was and how it was being addressed.
Re-engineer a solution to avoid the same happening again.
Credit customers for not delivering against SLA terms.
A vendor employee tried to address their public relations problem this way in his blog:
"OK, I'll take the blame for this -- sort of. We pride ourselves in putting a lot of thought into our customer designs. I'd argue that we're really, really good at it as well.
But not everyone is 100% sure of how their application will grow over time -- unfortunately, we're not psychics. And, let's be honest, not everyone necessarily wants to pay for redundancy we like to put into our designs.
We don't always get to directly engage all the time, either -- with products such as the (blanked out), most of this stuff moves through the channel. Somebody calls up one of our partners, says that they want to buy one of our products, and one gets sold -- and a lot of product gets sold that way."
I understand the desire to explain how messes become messy, but I'm not sure why he felt the need to speculate that his company's business partners or that their customer's budget were key elements of the problem. That is tantamount to saying, "All of our (blanked out) customers could have the same thing happen to them too." Anybody who has ever been close to one of these melt-downs knows there are many variables involved - including vendors underbidding each other and shaving elements from their bid in order to win the business.
From a distance, it looks like the vendor's response to the customer was good, although there apparently were some issues with failure notification from the array when the event occurred. I wouldn't call these sorts of things "Perfect Storms", but there are unfortunate scenarios where multiple things go awry. All vendors have these sorts of bad days, which serve as painful learning experiences. Unfortunately for customers, it's one of the ways vendors improve their customer support processes.
The customer also wrote in his blog, explaining the situation to their customers:
"Our SAN vendor analyzed the system logs for the event and determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. Our vendor performed an emergency upgrade. The newer version of firmware includes a fix for the bug. We are taking additional corrective actions to make certain that there is enough spare capacity on the SAN. This will assure it performs without performance degradation in the event of a single hardware failure."
The reparation sounds reasonable, but it's not what I would call best of breed either. I'll explain why in the remainder of this post.
The old trusted dual controller just can't keep up
The explanation the service provider gave to their customers was only half correct. Yes, the failure in one controller was due to a firmware bug -and yes, all vendors find out about some of them at customer sites - but the inability of the surviving controller to handle the workload was another matter altogether.
The major defect of all dual controller designs for service provider applications is the uselessness of write cache when operating in degraded mode on a single controller.
When a dual controller array has a controller failure, all traffic is failed over to the surviving controller. However, this controller can't afford to place writes in cache because if this controller also fails any un-flushed writes in cache would be lost- making the recovery process all the more painful. As a result, the throughput of the controller degrades significantly because writes now take several orders of magnitude longer to process as each write must be completed at the physical disk level, instead of in fast cache memory. When you consider the sort of read/write ratios involved with an email application (heavy writes), it's not surprising to hear that it took 32 hours for the system to get caught up. I suspect that if the surviving controller had been able to use write cache, the customer might have experienced some amount of service level problems, but not nearly as bad as they suffered.
Write performance during array component failures is an important point that many customers give insufficient weight to when making their purchases. Public service providers certainly need to understand this. The exact same scenario - controller failure and subsequent drop in service levels - could certainly happen to a traditional data center customer, but the ramifications of this scenario are not as ugly as they are for a multi-tenant public service provider.
This case is a perfect example of how an older architecture is incapable of meeting the requirements of the new cloud service business model. If you are a cloud service provider reading this and wondering if you might have a similar exposure to a controller failure (including 3PAR customers with dual -controller arrays), my advice is to review what you have and start thinking about what you should expect if you have a controller failure and how you might want to deal with it on both a short-term and long-term basis. Best of breed cloud storage should not include dual controller arrays.
Their solution is to buy more and utilize it less
One of the identified corrective actions is having "enough spare capacity on the SAN", which in this case involves installing a second array. Without knowing the inside scoop, it looks like the idea is to split the workload across the two arrays so that if a controller failure occurs in either array, the performance drop won't be as noticeable. The array that doesn't suffer the failure will keep working as expected and the array that has the failure will only have half the load to deal with.
There are two primary problems with this "fix"
Performance will still suffer on the array with the controller failure
The I/O load will continue to increase over time
You are always going to have performance degradation of some sort when you can't use write caching, unless you are only reading data - which isn't the case here. It is flat out wrong to assume that a performance problem will not occur. Regardless, with the new two-array SAN, whichever system has the controller failure should be able to get caught up much faster than the 32 hours this customer had to wait. Of course, the customer's capacity and I/O load will almost certainly increase over time, and as that happens, the strategy of splitting the load between two arrays loses its effectiveness.
Along with adding the controllers, they are also certainly adding disk drives, and some notion of what "reasonable" utilization limits should be for them. The problem with limiting utilization as a best practice is that it puts the stamp of approval on inefficiency - not only for capacity utilization abut also for the power and cooling required to support all those underutilized drives. Most legacy arrays have built-in inefficiencies in the way data is laid out on disks, making it virtually impossible to achieve uniform utilization across all disk resources. The result is uneven consumption of disk capacity, as well as uneven I/O service levels among different disk groups, which is another variable in how much performance degrades following a controller failure in a dual controller array.
Finally, the customer now has two arrays to manage, including multipath connections, SAN zones, and all other aspects of the configuration, which all contribute down the road to change management complexities. The result is a net drag on administrator effort and an increased TCO.
How many do you need?
A true best of breed solution would address the root-cause deficiency in the array's design, without creating additional management and cost burdens to the customer. Obviously, more than two controllers are needed. But how many controllers does a cloud service provider need in an array? The answer is at least three. Why? Because when a single controller fails, there can still be two surviving controllers working together, mirroring their cache contents, and performing fast writes to cache memory. That said, controllers are usually packaged in pairs for redundancy purposes, which means that the most likely configurations will have four controllers.
If you compare a single quad controller array with two dual controller arrays there are some key advantages that immediately jump out:
No or limited loss of performance after a controller failure
All drives and cache can be used to service all workloads
Managing a single array significantly reduces cost and complexity
A better recipe for maintaining performance levels
The next question is; "Is there a suitable quad controller array that the customer could have used instead of the two dual controller arrays they have?" Yes, 3PAR's F400 or T400 arrays are both quad controller arrays. The disk drives in these arrays can be either SATA or FC, or a mix of both types if the customer wanted to implement tiering. Product information of the F400 can be found here, and the T400 here.
However, simply putting four controllers in an array does not necessarily guarantee that they will be able to sustain write caching if one of them fails. The array must have the ability to remap and re-mirror the write cache contents of all four controllers to the surviving controllers following the loss of a controller. It's an interesting geometric sort of problem: There are four controllers, each with their own cache and cache that is mirrored from the other controllers in the array. All cache contents, including mirrors, need to be distributed evenly across all controllers to avoid congestion and load imbalances. All cache content, including mirrors needs to be accounted for within the array so that if a controller fails, the other controllers will be able to identify all the surviving original and mirrored copies of data. For cache data that has lost either a primary or mirrored copy, a second (new) copy needs to be made. Finally, the amount of data in cache may need to be re-leveled (decreased) to fit into the degraded cache capacity (3 controllers instead of 4).
The software for doing this in a 3PAR array is Persistent Cache. Product information on Persistent Cache is here (PDF)
I made a 9 minute last year video describing how Persistent Cache works. Here it is again. Thanks for watching.
(A quote from Dieter Rams - former Chief of Design at Braun)
It's hard to think of a company that has had more success with it's product designs than Apple. When you look into how Apple did it, you find out about Jonathan Ive - Apple's lead industrial designer - and how his designs have followed the philosophy outlined by Dieter Rams, who was the lead designer for many years at Braun. When you compare photos of their designs, it is obvious that Ive has a strong appreciation for Rams' work.
What Ive and others have found compelling in Rams' work is nicely summarized in the design principles Rams used at Braun for many years.
Good design is innovative
Good design makes a product useful
Good design is aesthetic
Good design makes a product understandable
Good design is unobtrusive
Good design is honest
Good design is long-lasting
Good design is thorough down to the last detail
Good design is environmentally friendly
Good design is as little design as possible
The design goals for consumer products differ considerably from those for industrial products. For example, aesthetics and innovation tend to be less important than reliability and ROI - two characteristics that didn't even make it onto Rams' list of design principles. But there are also principles that certainly belong to both, such as making a product useful and unobtrusive. So, what should the 10 design principles be for information infrastructures products? Here's my list:
Good design makes a product useful (it solves customer problems)
Good design has recognized limitations
Good design is unobtrusive (needs minimal management)
Good design has an attainable ROI
Good design makes efficient use of resources
Good design is scalable (capacity, performance & management)
Good design is resilient (sustains performance through sub-optimal conditions)
Good design makes a product understandable (facilitates planning & changes)
Good design is long-lasting (and accommodates future innovations)
Good design is environmentally friendly
Producing this list was much more interesting than I thought it would be. For starters, it took me some time to get settled in a customer's perspective - as opposed to my usual vendor employee perspective. (l have this wonderful hammer you need). Also, to clarify a point, the idea of management scalability involves the number of people who can effectively manage and control a system simultaneously. That might not be a concern for smaller IT systems, but it certainly is for large-scale systems.
What would you change? Would you reduce or expand this list?
Here's a ComputerWorld produced video from last year that is an interview with Hala
Al-Adwan, VP of Data, and Richard Buckingham, VP of Technical Operations at MySPace. In it they talk about the scaling challenges they have and how 3PAR helps them produce their massive site every day.
I wrote a post yesterday that showed IOPS calculations for a few different native wide striping configurations and I thought I'd add storage tiering to the mix today. Native wide striping places data from all volumes across all drives in the array (or of a certain drive class if you have mixed drives in your array) and randomizes workloads across all resources. The biggest advantages of native wide striping over traditional array designs that rely on multiple pools and workload isolation are:
Avoiding poor storage utilization associated with workload isolation
Avoiding IOPS and capacity limits associated with workload isolation
Avoiding hotspots associated with legacy RAID striping
Sustaining dependable, high performance for mixed workloads
Improving both VM and storage densities
Although native wide striping can handle complex, mixed workloads of transaction and sequential data access, applications that are either latency sensitive or single threaded can significantly increase their storage performance through the use of SSDs and storage tiering.
3PAR's software for storage tiering is called Adaptive Optimization, or AO. Based on administrator policies and algorithms keyed off QOS gradients, a 3PAR InServ array autonomically copies data from lower IOPS disk drives onto high IOPS SSDs.
The 3PAR tiering solution uses STEC MACHIOPS SSDs with a sustainable I/O rate of 10,000 IOPS. These devices have 50GB capacities and are installed as sets of eight SSDs across all mesh active controllers in a 3PAR InServ array to balance the high IOPS workload load over all controllers as well as drives.
IOPS calculations
Below are a few calculations for maximum sustainable IOPS from InServ arrays that use both SATA drives as well as SSDs with AO. I used 5,000 IOPS as the metric for calculating SSD performance, which is a conservative estimate for the STEC MACHIOPS performance, but actual performance from an AO-enabled array could be lower due to a number of variables including the I/O activity levels that can be sustained by both applications and servers, policy settings made by storage administrators, the accuracy of algorithms to select data for tiering and copy operations that populate and de-populate the SSDs.
Storage tiering is still in it's early stages and the industry is going to learn a great deal about this technology over the next several years. Performance models will certainly evolve as key variables are identified, which will almost certainly include server and application components.
Array 1: 160 SATA disk drives; 80% reads, no SSDs
Total IOPS of all drives in the array: 12,800
IOPS delivered to all servers w/RAID 5: 8,000
IOPS delivered to all servers w/RAID 10: 10,667
IOPS delivered to all servers w/RAID 6: 5,998
Array 2: 160 SATA disk drives; 80% reads - 8 SSDs; 50% reads
Total IOPS of all drives in the array: 52,800
IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 21,998
Array 3: 160 SATA disk drives; 80% reads - 32 SSDs; 50% reads
Total IOPS of all drives in the array: 172,800
IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 69,998
Array 4: 480 SATA disk drives; 80% reads - 32 SSDs; 50% reads
Total IOPS of all drives in the array: 198,400
IOPS delivered to all servers w/RAID6 (SATA) & RAID5 (FC): 81,994
Conclusions
Even a relatively small amount of SSD storage can boost performance approximately four times, as shown by Arrays # 1 and 2 above where eight SSDs totaling 400GB were added to an all-SATA configuration. It's also interesting to note the performance differences between arrays #3 and #4 above. Although the number of SATA drives tripled, the IOPS performance increased only 15%.