To a lot of people, especially those who are unfamiliar with the storage industry, one of the obvious questions is "Who are these people and where did they come from?"
The answer is that the company was formed by a group of server-cluster engineers from Sun and has been around for over a decade developing and selling large scale storage products designed for something that used to be called "utility computing" seven years ago, but today is just called "the cloud".
We've been very successful with our cloud strategy and have 7 of the top 10 IAAS (infrastructure-as-a-service) customers as clients. 3PAR products work very hard in the background for a lot of household-name customers. Most people don't know or care.
However, cloud industry vendors know 3PAR because they are also very heavily involved with those same customers, competing with their own products. They see our storage systems in those large data centers and our customers tell them that they need to make sure they work with us. There's nothing unusual about that sort of thing, but we definitely are a player.
Here's what we do very well:
Handle mixed workloads in multi-tenant environments. That means we can service a wide range of applications concurrently from multiple servers (or virtual servers) concurrently. If you are a cloud service provider and you need to have the confidence that your infrastructure will deliver consistently - even if your business changes radically - 3PAR is a very safe choice.
Operate efficiently at high utilizations. Most storage in the world today is underutilized, which gets under people's skin because there is a high cost associated with that. 3PAR has led the industry with thin storage technologies that started with our implementation of thin provisioning many years ago and has been recently enhanced through thin reclamation, thin conversion and thin persistence software features. Again cloud service providers need to keep the cost of their operations under tight control.
Give administrators the highest levels of automation. Administrative work on storage has historically been difficult and time consuming, which has been problematic for cloud service providers who have to be able to increase and reconfigure resources to meet fast-changing business demands. 3PAR storage lets system administrators make adjustments more quickly and confidently.
The thing that we didn't completely understand at 3PAR was how quickly the onset of the virtualized data center was going to tilt the storage world in our direction. 3PAR storage systems are based on a highly advanced, granular storage architecture. It's not always the easiest thing for people to understand because it is so different than any other vendor's architecture. However, people familiar with virtualized server features have a much easier time understanding how our technology works. There is nothing like a terrific, relevant analogy for explaining how your different widget works.
3PAR is a relatively small company, competing with much larger companies who use the benefits of their size, global reach and service organizations against us every day in sales opportunities. It hasn't been easy, but we've continued to grow our business in a very hotly contested arena where our competitors like to position us as the "small, new company" Storage purchases in this market are high stakes and careers can be made or lost on the right decision. We certainly don't win all the deals we are in, but we very seldom lose on technical merit. Usually it's because we are lesser known or because we can't match the service offerings of our larger competitors.
It appears that some of those variables will be changing for us relatively soon.
A couple weeks ago, one of the major storage vendors had two major problems to resolve after one of their arrays suffered a firmware bug-induced failure at one of their cloud (email) service provider customers. They had to:
Help the customer get back to normal service levels after they had become unacceptable.
Confront a public relations problem after it was exposed by a leading storage publisher.
Meanwhile, their service provider customer had four major problems to resolve:
Get service levels back to acceptable levels.
Communicate to their customers what the problem was and how it was being addressed.
Re-engineer a solution to avoid the same happening again.
Credit customers for not delivering against SLA terms.
A vendor employee tried to address their public relations problem this way in his blog:
"OK, I'll take the blame for this -- sort of. We pride ourselves in putting a lot of thought into our customer designs. I'd argue that we're really, really good at it as well.
But not everyone is 100% sure of how their application will grow over time -- unfortunately, we're not psychics. And, let's be honest, not everyone necessarily wants to pay for redundancy we like to put into our designs.
We don't always get to directly engage all the time, either -- with products such as the (blanked out), most of this stuff moves through the channel. Somebody calls up one of our partners, says that they want to buy one of our products, and one gets sold -- and a lot of product gets sold that way."
I understand the desire to explain how messes become messy, but I'm not sure why he felt the need to speculate that his company's business partners or that their customer's budget were key elements of the problem. That is tantamount to saying, "All of our (blanked out) customers could have the same thing happen to them too." Anybody who has ever been close to one of these melt-downs knows there are many variables involved - including vendors underbidding each other and shaving elements from their bid in order to win the business.
From a distance, it looks like the vendor's response to the customer was good, although there apparently were some issues with failure notification from the array when the event occurred. I wouldn't call these sorts of things "Perfect Storms", but there are unfortunate scenarios where multiple things go awry. All vendors have these sorts of bad days, which serve as painful learning experiences. Unfortunately for customers, it's one of the ways vendors improve their customer support processes.
The customer also wrote in his blog, explaining the situation to their customers:
"Our SAN vendor analyzed the system logs for the event and determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. Our vendor performed an emergency upgrade. The newer version of firmware includes a fix for the bug. We are taking additional corrective actions to make certain that there is enough spare capacity on the SAN. This will assure it performs without performance degradation in the event of a single hardware failure."
The reparation sounds reasonable, but it's not what I would call best of breed either. I'll explain why in the remainder of this post.
The old trusted dual controller just can't keep up
The explanation the service provider gave to their customers was only half correct. Yes, the failure in one controller was due to a firmware bug -and yes, all vendors find out about some of them at customer sites - but the inability of the surviving controller to handle the workload was another matter altogether.
The major defect of all dual controller designs for service provider applications is the uselessness of write cache when operating in degraded mode on a single controller.
When a dual controller array has a controller failure, all traffic is failed over to the surviving controller. However, this controller can't afford to place writes in cache because if this controller also fails any un-flushed writes in cache would be lost- making the recovery process all the more painful. As a result, the throughput of the controller degrades significantly because writes now take several orders of magnitude longer to process as each write must be completed at the physical disk level, instead of in fast cache memory. When you consider the sort of read/write ratios involved with an email application (heavy writes), it's not surprising to hear that it took 32 hours for the system to get caught up. I suspect that if the surviving controller had been able to use write cache, the customer might have experienced some amount of service level problems, but not nearly as bad as they suffered.
Write performance during array component failures is an important point that many customers give insufficient weight to when making their purchases. Public service providers certainly need to understand this. The exact same scenario - controller failure and subsequent drop in service levels - could certainly happen to a traditional data center customer, but the ramifications of this scenario are not as ugly as they are for a multi-tenant public service provider.
This case is a perfect example of how an older architecture is incapable of meeting the requirements of the new cloud service business model. If you are a cloud service provider reading this and wondering if you might have a similar exposure to a controller failure (including 3PAR customers with dual -controller arrays), my advice is to review what you have and start thinking about what you should expect if you have a controller failure and how you might want to deal with it on both a short-term and long-term basis. Best of breed cloud storage should not include dual controller arrays.
Their solution is to buy more and utilize it less
One of the identified corrective actions is having "enough spare capacity on the SAN", which in this case involves installing a second array. Without knowing the inside scoop, it looks like the idea is to split the workload across the two arrays so that if a controller failure occurs in either array, the performance drop won't be as noticeable. The array that doesn't suffer the failure will keep working as expected and the array that has the failure will only have half the load to deal with.
There are two primary problems with this "fix"
Performance will still suffer on the array with the controller failure
The I/O load will continue to increase over time
You are always going to have performance degradation of some sort when you can't use write caching, unless you are only reading data - which isn't the case here. It is flat out wrong to assume that a performance problem will not occur. Regardless, with the new two-array SAN, whichever system has the controller failure should be able to get caught up much faster than the 32 hours this customer had to wait. Of course, the customer's capacity and I/O load will almost certainly increase over time, and as that happens, the strategy of splitting the load between two arrays loses its effectiveness.
Along with adding the controllers, they are also certainly adding disk drives, and some notion of what "reasonable" utilization limits should be for them. The problem with limiting utilization as a best practice is that it puts the stamp of approval on inefficiency - not only for capacity utilization abut also for the power and cooling required to support all those underutilized drives. Most legacy arrays have built-in inefficiencies in the way data is laid out on disks, making it virtually impossible to achieve uniform utilization across all disk resources. The result is uneven consumption of disk capacity, as well as uneven I/O service levels among different disk groups, which is another variable in how much performance degrades following a controller failure in a dual controller array.
Finally, the customer now has two arrays to manage, including multipath connections, SAN zones, and all other aspects of the configuration, which all contribute down the road to change management complexities. The result is a net drag on administrator effort and an increased TCO.
How many do you need?
A true best of breed solution would address the root-cause deficiency in the array's design, without creating additional management and cost burdens to the customer. Obviously, more than two controllers are needed. But how many controllers does a cloud service provider need in an array? The answer is at least three. Why? Because when a single controller fails, there can still be two surviving controllers working together, mirroring their cache contents, and performing fast writes to cache memory. That said, controllers are usually packaged in pairs for redundancy purposes, which means that the most likely configurations will have four controllers.
If you compare a single quad controller array with two dual controller arrays there are some key advantages that immediately jump out:
No or limited loss of performance after a controller failure
All drives and cache can be used to service all workloads
Managing a single array significantly reduces cost and complexity
A better recipe for maintaining performance levels
The next question is; "Is there a suitable quad controller array that the customer could have used instead of the two dual controller arrays they have?" Yes, 3PAR's F400 or T400 arrays are both quad controller arrays. The disk drives in these arrays can be either SATA or FC, or a mix of both types if the customer wanted to implement tiering. Product information of the F400 can be found here, and the T400 here.
However, simply putting four controllers in an array does not necessarily guarantee that they will be able to sustain write caching if one of them fails. The array must have the ability to remap and re-mirror the write cache contents of all four controllers to the surviving controllers following the loss of a controller. It's an interesting geometric sort of problem: There are four controllers, each with their own cache and cache that is mirrored from the other controllers in the array. All cache contents, including mirrors, need to be distributed evenly across all controllers to avoid congestion and load imbalances. All cache content, including mirrors needs to be accounted for within the array so that if a controller fails, the other controllers will be able to identify all the surviving original and mirrored copies of data. For cache data that has lost either a primary or mirrored copy, a second (new) copy needs to be made. Finally, the amount of data in cache may need to be re-leveled (decreased) to fit into the degraded cache capacity (3 controllers instead of 4).
The software for doing this in a 3PAR array is Persistent Cache. Product information on Persistent Cache is here (PDF)
I made a 9 minute last year video describing how Persistent Cache works. Here it is again. Thanks for watching.
Videos are smaller, formatted for 320x 240 small screens
Videos do not stream, they download completely and then play - so it takes longer to start playing
The content does not mirror that on StorageRap
Content on iTunes lags that on Podbean by a day or so
The video images in the podcast player are different than those on 3PARTV and this blog
If there is something on this blog that you want in a podcast version, contact me on Twitter (3ParFarley) or through a blog comment and I'll make a podcast for you.
Searching on the word storage, I've copied and pasted every reference I found to storage in it.
Brian Gladden (part of his presentation) Storage
revenue was down 8% sequentially and 19% year-over-year. EqualLogic
revenue was up 31% with continued strength from a margin standpoint.
Our overall storage margin rates are up over 30% versus where they were last year.
Michael Dell (from his comments) We are focused on expanding our recurring revenue and profit streams
with a differentiated view of how to win in the enterprise. Storage
and server revenues were up 5% sequentially as we continue to bring our
customers an expanding portfolio of best value enterprise solutions.
EqualLogic
grew 31% year-over-year with a strong pipeline heading into Q4
including demand for our recently launched PS4000 virtualized storage
array. We have now added over 10,000 new customers to the EqualLogic
platform bringing almost 15,000 customers and we continue to invest in
the R&D, solutions and sales resources to expand this platform.
From the Q&A section:
David Bailey - Goldman Sachs
The non-EqualLogic storage sales were weak this quarter. What are you doing to reverse this trend and how long do you think it will take?
Michael Dell
There is a shift going on there where we are moving away from non-Dell branded storage offerings. You can think of this as dual margin pass through revenue. So our margin on storage
continues to increase and with the additional, better mix of sales also
comes service annuities which obviously are valuable over time to us.
So we are moving more and more to the portfolio f Dell IP, Dell branded
and in some cases co-branded products. That is the direction. That is
why you see the shift in the mix and the increased margins.
Jayson Noland - Robert W. Baird & Co., Inc.
A question on the timing of various hardware categories. It seems like Dell storage
and server has seen fairly positive trends over the past couple of
quarters. If you could talk about how your customers are thinking about
investment in data center hardware versus investment in the client?
Michael Dell
Well
as I said earlier I think the mission critical data center activity has
been high and the [inaudible] processor brought out the virtualization
wave in a big way because it accelerated the number of servers you
could virtualize in one platform. We have been particularly strong with
our blade offering, with our data center custom solutions, with our storage
offerings. We think our 11G combined server and really management
solution that goes with that is a fantastic offering that is really
resonating with customers. It is a place where there is a bias for
investment inside these accounts.
Dell, on the other hand, accounted for 10.4 percent of EMC's total
revenue, and slightly under 30 percent of Clariion revenue, Goulden
said.
Dell has traditionally been EMC's largest channel partner, accounting for about one-third of EMC's revenue. However, Dell acquired EqualLogic early this year, giving it a product line and an indirect channel which competes with EMC.
When asked by analysts during the Q&A session following
EMC's prepared statements about the 26-percent year-to-year and
10-percent quarter-to-quarter drop in Dell business, Joe Tucci, EMC's
chairman, president, and CEO, said that the strong growth of the
Clariion business came from two factors: a robust channel build-out and
an increase in EMC's commercial and SMB business.
"On the other hand, we are very quickly and very actively working with Dell, and -- if you talk to (Dell Chairman and CEO) Michael Dell, he'll have the same statement
-- we believe there's a lot more we can and should be doing together,"
Tucci said. "We probably got a little bit off track, and we're working
hard to include the Dell channel on top of everything else we're
doing."
The StorageRap take on the whole mess:
Dell has figured out that EqualLogic's product is very strong and they like the higher margins. But it doesn't look like they have figured out that it is targeted at SMB customers and isn't really a great fit for large data center and cloud customers. Dell needs to figure out there is a disconnect here or they might find themselves with a lot of explaining to do at some point in the future.
VCE. EMC is in love with a new server systems friend named Cisco. The Cisco relationship has been getting a lot of attention from EMC this year and Dell feels rejected. But what can they say after buying EqualLogic? Nothing about EMC, apparently.
EMC has lost its star partner for Clariion. It's not clear how its going to make that business back any time soon - especially with their big focus on VCE. Its a big gamble to be sure. Not that EMC has stopped developing Clariion, because they haven't, but the SMB options continue to improve - from Dell-EqualLogic, from Compellent and - at the high end of this space - from 3PAR with our F Class systems. I don't see how Cisco is going to replace Dell any time soon as a friend for Clariion.
I'd like to invite everybody with aging Clariion systems to check out 3PAR's F Class cluster storage arrays - especially the Thin Conversion feature that can get you much higher utilization and consolidation rates than you think are possible from mid-range and enetrprise storage.
Everybody has an opinion these days about Cloud Computing and Cloud Storage. 3P tells "Store Heads" in his new video below that they ought to start investigating how to do it. The equipment used by cloud service providers matters a great deal to their success. Technologies such as reservationless thin provisioning, full array wide striping and dynamic, load-balancing, active/active controllers make the cloud experience much more satisfying. 3PAR partners with Cloud Service providers through a program called Cloud Agile to bring 3PAR array technology to cloud computing and storage customers.
Earlier this week 3PAR rolled out the Cloud Agile Program with our charter partners, Attenda, DataPipe, Terremark and Verizon Business. The mission of Cloud Agile is to bring 3PAR's efficient and scalable enterprise storage capabilities to a wide range of customers through cloud (utility) service partners.
One of my favorite aspects of Cloud Agile is the concept of a virtual private array, which is the same idea as a virtual private server, but applied to storage. Customers of our Cloud Agile partners will be able securely manage their own thin slices of our arrays, taking full advantage of massive wide striping and thin provisioning.
The Steering Wheel Camera Society of America moved swiftly to give Kostadis Roussos of Netapp the undesirable Worst First Award for his recent storage smackup video. Up to this point the SWCSA has not involved itself in best or worst dressed minutiae, but it could become necessary as a way to prevent further slumming of the commons.
Catastrophe theory attempts to explain how seemingly small, incremental changes to systems result in catastrophic results - such as bridge collapses. Sometimes it involves oscillating or resonating behaviors that get out of control.
When HDS released their version of dynamic provisioning, they created a system that can grow in ways that can cause problems for customers. The allocation unit for dynamic provisioning is large compared to other thin technologies - and when lots of host systems start writing to unallocated block storage, the amount of available, physical storage needed to supply all those allocations can spike quickly. Can it become a storage catastrophe? I don't know, but it certainly is the poster child for thin provisioning out-of-disk-space concerns.
In order to fix the over-allocation of storage by dynamic provisioning, they (sort of) released their zero page reclaim feature back in February to get back some of the allocation overkill. It's not exactly two wrongs making a right, but zero page reclaim is simply a bug fix for sloppy, bloated dynamic provisioning. It's interesting that several months after making the functionality available they announced it this week as if its something new. I guess you can do that if you keep it swept under the rug well enough.
Reclaim technology is going to be a very big deal in the storage industry, but it will work much better when it is integrated with efficient thin technologies, such as those developed by 3PAR, to create a consistent, well-behaved, predictable environment - as opposed to one that requires customers to manage unpredictable allocation storms.
3P got all excited this week when Joe Tucci from EMC made his public plea to Data Domain's employees in the San Jose Mercury News because he wanted to tell them something too. He didn't have the budget to spend on a full page in the paper, but he has other ways to get his message across - like this Rap Blog.
The Board of Directors of Data Domain (NASDAQ:DDUP) today commented on
the unsolicited offer it has received from EMC Corporation (NYSE:EMC) to
acquire all of the outstanding shares of Data Domain common stock for
$30.00 per share in cash. Consistent with its fiduciary duties and in
consultation with its financial and legal advisors, Data Domain’s Board
is reviewing EMC’s offer. At this time, the Board is not making a
recommendation with respect to the EMC offer. Data Domain requests that
its stockholders defer making a determination whether to accept or
reject EMC’s offer until Data Domain has communicated to stockholders
its position regarding the tender offer from EMC. In accordance with
Rule 14d-9 of the Securities Exchange Act of 1934, on or before June 16,
2009, Data Domain will communicate to stockholders its position
regarding the tender offer from EMC. At this time, the Board is
reaffirming the recommendation in favor of Data Domain’s merger with
NetApp, Inc. (NASDAQ:NTAP) that is described in the Registration
Statement on Form S-4 that NetApp has filed with the Securities Exchange
Commission.
In this short video Cartoon Curtis Preston makes his debut with the League of Suspicious Avatars (LOSA):
FWIW, I don't think this is a case where EqualLogic systems are displacing Clariion sales - although some of that is probably going on - but is instead a case where EqualLogic storage is successful in SMB accounts just as it always has been.
3PAR and EqualLogic arrays are similar in the level of automation they provide, which saves storage administrators lots of time and effort and is why both are popular now. But other than that, the products are very different and developed for distinctly different types of customers, with some overlap in the medium sized business segment.
The storage industry screws just keep turning. A few minutes ago I heard that Netapp was acquiring Data Domain. Congratulations to the stockholders of Data Domain, but I hate to see another growing, independent storage company get swallowed. Once Data Domain joins Netapp, 3PAR and Compellent are going to be the only publicly traded array vendors with consistent sales growth over the last couple years.
This deal surprises me because Netapp's VTL products compete with Data Domain's products. Last October Chris Mellor from The Register wrote on how Netapp finally had an answer to Data Domain, Quantum and EMC. Apparently that answer needed to be amended six months later with cash and stock worth $1.5B. Netapp will have to spin this carefully to their shareholders and VTL customers. It certainly is possible to have overlapping products in the product line, but I'm not sure how well that will work for the de-dupe market.
One part of the story could be a SAN-only storage array from Data Domain - meaning it wouldn't come with a NAS head. This would give Netapp access to the SAN market in a way they have never been able to do with their Filer products.
I can't help but wonder if part of this deal is about generating interest in Netapp. Lately EMC, 3PAR, Compellent and Sun have been generating most of the attention in the storage industry and Netapp has been pretty quiet. That will change for a few months now (or at least until the next big acquisition occurs).
Otherwise, Data Domain lacks a corporate rapper, but if you ask me, they never had that far to go to catch up with Netapp. Still 3PAR's 3P might have to step it up a notch now that the Double Team can tag team with the Netapp rap crue.