A screencast discussing the bid EMC is making for Isilon. Covers the topics EMC highlighted in their announcement, including "big data", Atmos integration and the EMC effect. Contrasts the differences between EMC's divergent NAS strategies and HP's Converged Infrastructure NAS strategy with the X9000 platform.
When the XIV product was subsequently introduced we were concerned at 3PAR because it had the potential to be a serious competitor to what we were doing. Moshe Yanai, XIV's founder, was known to be a talented engineer and the rumor was that he had developed a 3PAR killer. With all the sales resources at IBM's disposal, a technology that was even remotely competitive could be very dangerous to us.
Turns out it wasn't much of a killer and we rarely run into it in our accounts any more. IBM claims it is succeeding, but where we sell our wares, its more or less invisible. Frankly speaking, our sales people like it on the rare occasion that XIV is the competition. It doesn't scale or perform nearly as well as our systems do and it's far less efficient in terms of operating cost. With 3PAR, less does more. With IBM XIV, more does less.
After being named as an IBM fellow in 2008, it appears that Mr. Yanai has left IBM, leaving one to wonder what the future of XIV is going to be. Certainly the technology can continue to be developed without him, but there is an awful lot of work left to make it more competitive with 3PAR storage.
The barriers to entry in this business are very high, as IBM has found out with XIV. The question I have is how much longer they are going to continue to try.
There was a lot written last week surrounding VMware's release of vSphere 4.1. Netapp appeared to have a lot to say, but it was confusing to figure out what they were really talking about. I think I've got it now.
It's unusual for a company to be invited as a centerpiece of high-visibility festivities and then mysteriously decide not to follow through. It would be like getting complimentary tickets and backstage passes from
Lady Ga Ga herself, telling all your friends about it and then not going. It it does make one wonder. Why wouldn't you do whatever it takes to be included in VMware's big summer announcement party? Well, if you're Netapp, the answer appears to be - "Being there is over-rated. Just make sure everyone thinks you were." Call it Photoshop for PR or call it keeping your poker face, it's a mash up of a blown opportunity and opportunistic courage.
The excitement for VMware's storage partners was concentrated in two areas: VAAI (vStorage API for array integration) and SIOC (Storage I/O Control). The initial release of VAAI includes new SCSI block storage commands that allow the arrays to offload host systems from redundant, resource-consuming tasks. SIOC is a method for managing I/O queues to create more fairness in accessing storage resources. Netapp issued a press release last week in conjunction with the vSphere 4.1 release, but it was for their Virtual Storage Console, not for the support of the storage enhancements in vSPhere 4.1. There was a flag waving mention of VAAI:
"Additionally, NetApp is supporting the new VMware vStorage APIs for
Array Integration (VAAI) capabilities that offload data management tasks
from the host server to the storage system. This can free up host CPU
cycles for better performance and increased virtual machine density."
That's not exactly saying anything, but its more than they had to say about SIOC, which was zilch.
The bottom of the release directs readers to Vaughn Stewart's blog for more info. Apparently, Netapp's PR department left the rest of the innuendo up to Vaughn - a diligent and loyal Netapp employee who understands that sometimes a vendor blogger doubles as a PR bagman. It looks like I need to add a new chapter to Vendor Blogging with Dummies.
You have to dig into the comments to get some of the details, but Vaughn's blog does a decent job explaining that Netapp is working on delivering VAAI functionality in
Q4 2010. Now, that's not all that late considering its only 6 months or so away, but as a privileged insider to VAAI development, it's not a great showing either. In fact, it wouldn't surprise me if
some of the companies who were not in the program, such as Compellent, HP, IBM and Xiotech come out
with VAAI plug-ins before Netapp does. As for 3PAR, we will have
our VAAI plug-in available in September as part of a maintenance
release. We didn't have a lot of time to develop VAAI functionality after gaining access to the APIs in early 2010, but we fast-tracked the
development of it in order to make the announcement.
As much as I admire Vaughn's hutzpah for stepping in to carry the load that others at Netapp should have, there were a few problems with what he said. First was the absurd statement that "
SAN is attempting to be more NAS-like". There is so much wrong with that statement that it's difficult to find a place to start. Who or what is SAN? Is VMWare SAN? Is the T10 SCSI standards committee SAN? Is SAN the being an embodiment of SAN the block protocol? Is there a virtual reality thing going on here? And what is NAS-like anyway? Does it have anything to do with the size of one's beak or the way particular vowels resonate in the sinus cavities? Or is it like racing the back roads in a used chevy? Whatever Vaughn meant, I tend to dislike the imprecision of technology anthropomorphism.
The second thing Vaughn said was "As for the first release of VAAI... These features ALREADY EXIST in NFS." Really, block zeroing? That is a function developed for EagerZeroThick volumes, which are only supported on VMFS datastores, not NFS datastores. Perhaps we will see that change in the future, but for now its SAN only.
Hardware assisted locking is a way to allow smaller granular locking for VMFS and addresses an issue with VMDK-level operations in a shared datastore. Because NFS puts VMDKs in separate datastores, which are locked independently, hardware assisted locking is unnecessary for NFS. In other words, its a SAN only function because the current NFS datastore architecture doesn't need it.
The other API in VAAI is Full Copy. This VAAI API appears to be functionally equivalent to a Netapp utility called RCU (Rapid Cloning Utility) that was included as a function in their Virtual Storage Console. It is not, however, something that exists in NFS, unless Netapp wants to give that feature to all it's NAS competitors. As a vSphere function, Full Copy will be available to all vendors that implement the VAAI APIs. It will be interesting to see what differences there are as far as programmatic control using the VAAI plug-ins, vendor-specific consoles and Powershell.
Nate at Techopsguys has put together a comparison of SPC-1 benchmarks with six different bar charts showing the various characteristics of the configurations, performance and cost.
A couple weeks ago, one of the major storage vendors had two major problems to resolve after one of their arrays suffered a firmware bug-induced failure at one of their cloud (email) service provider customers. They had to:
Help the customer get back to normal service levels after they had become unacceptable.
Confront a public relations problem after it was exposed by a leading storage publisher.
Meanwhile, their service provider customer had four major problems to resolve:
Get service levels back to acceptable levels.
Communicate to their customers what the problem was and how it was being addressed.
Re-engineer a solution to avoid the same happening again.
Credit customers for not delivering against SLA terms.
A vendor employee tried to address their public relations problem this way in his blog:
"OK, I'll take the blame for this -- sort of. We pride ourselves in putting a lot of thought into our customer designs. I'd argue that we're really, really good at it as well.
But not everyone is 100% sure of how their application will grow over time -- unfortunately, we're not psychics. And, let's be honest, not everyone necessarily wants to pay for redundancy we like to put into our designs.
We don't always get to directly engage all the time, either -- with products such as the (blanked out), most of this stuff moves through the channel. Somebody calls up one of our partners, says that they want to buy one of our products, and one gets sold -- and a lot of product gets sold that way."
I understand the desire to explain how messes become messy, but I'm not sure why he felt the need to speculate that his company's business partners or that their customer's budget were key elements of the problem. That is tantamount to saying, "All of our (blanked out) customers could have the same thing happen to them too." Anybody who has ever been close to one of these melt-downs knows there are many variables involved - including vendors underbidding each other and shaving elements from their bid in order to win the business.
From a distance, it looks like the vendor's response to the customer was good, although there apparently were some issues with failure notification from the array when the event occurred. I wouldn't call these sorts of things "Perfect Storms", but there are unfortunate scenarios where multiple things go awry. All vendors have these sorts of bad days, which serve as painful learning experiences. Unfortunately for customers, it's one of the ways vendors improve their customer support processes.
The customer also wrote in his blog, explaining the situation to their customers:
"Our SAN vendor analyzed the system logs for the event and determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. Our vendor performed an emergency upgrade. The newer version of firmware includes a fix for the bug. We are taking additional corrective actions to make certain that there is enough spare capacity on the SAN. This will assure it performs without performance degradation in the event of a single hardware failure."
The reparation sounds reasonable, but it's not what I would call best of breed either. I'll explain why in the remainder of this post.
The old trusted dual controller just can't keep up
The explanation the service provider gave to their customers was only half correct. Yes, the failure in one controller was due to a firmware bug -and yes, all vendors find out about some of them at customer sites - but the inability of the surviving controller to handle the workload was another matter altogether.
The major defect of all dual controller designs for service provider applications is the uselessness of write cache when operating in degraded mode on a single controller.
When a dual controller array has a controller failure, all traffic is failed over to the surviving controller. However, this controller can't afford to place writes in cache because if this controller also fails any un-flushed writes in cache would be lost- making the recovery process all the more painful. As a result, the throughput of the controller degrades significantly because writes now take several orders of magnitude longer to process as each write must be completed at the physical disk level, instead of in fast cache memory. When you consider the sort of read/write ratios involved with an email application (heavy writes), it's not surprising to hear that it took 32 hours for the system to get caught up. I suspect that if the surviving controller had been able to use write cache, the customer might have experienced some amount of service level problems, but not nearly as bad as they suffered.
Write performance during array component failures is an important point that many customers give insufficient weight to when making their purchases. Public service providers certainly need to understand this. The exact same scenario - controller failure and subsequent drop in service levels - could certainly happen to a traditional data center customer, but the ramifications of this scenario are not as ugly as they are for a multi-tenant public service provider.
This case is a perfect example of how an older architecture is incapable of meeting the requirements of the new cloud service business model. If you are a cloud service provider reading this and wondering if you might have a similar exposure to a controller failure (including 3PAR customers with dual -controller arrays), my advice is to review what you have and start thinking about what you should expect if you have a controller failure and how you might want to deal with it on both a short-term and long-term basis. Best of breed cloud storage should not include dual controller arrays.
Their solution is to buy more and utilize it less
One of the identified corrective actions is having "enough spare capacity on the SAN", which in this case involves installing a second array. Without knowing the inside scoop, it looks like the idea is to split the workload across the two arrays so that if a controller failure occurs in either array, the performance drop won't be as noticeable. The array that doesn't suffer the failure will keep working as expected and the array that has the failure will only have half the load to deal with.
There are two primary problems with this "fix"
Performance will still suffer on the array with the controller failure
The I/O load will continue to increase over time
You are always going to have performance degradation of some sort when you can't use write caching, unless you are only reading data - which isn't the case here. It is flat out wrong to assume that a performance problem will not occur. Regardless, with the new two-array SAN, whichever system has the controller failure should be able to get caught up much faster than the 32 hours this customer had to wait. Of course, the customer's capacity and I/O load will almost certainly increase over time, and as that happens, the strategy of splitting the load between two arrays loses its effectiveness.
Along with adding the controllers, they are also certainly adding disk drives, and some notion of what "reasonable" utilization limits should be for them. The problem with limiting utilization as a best practice is that it puts the stamp of approval on inefficiency - not only for capacity utilization abut also for the power and cooling required to support all those underutilized drives. Most legacy arrays have built-in inefficiencies in the way data is laid out on disks, making it virtually impossible to achieve uniform utilization across all disk resources. The result is uneven consumption of disk capacity, as well as uneven I/O service levels among different disk groups, which is another variable in how much performance degrades following a controller failure in a dual controller array.
Finally, the customer now has two arrays to manage, including multipath connections, SAN zones, and all other aspects of the configuration, which all contribute down the road to change management complexities. The result is a net drag on administrator effort and an increased TCO.
How many do you need?
A true best of breed solution would address the root-cause deficiency in the array's design, without creating additional management and cost burdens to the customer. Obviously, more than two controllers are needed. But how many controllers does a cloud service provider need in an array? The answer is at least three. Why? Because when a single controller fails, there can still be two surviving controllers working together, mirroring their cache contents, and performing fast writes to cache memory. That said, controllers are usually packaged in pairs for redundancy purposes, which means that the most likely configurations will have four controllers.
If you compare a single quad controller array with two dual controller arrays there are some key advantages that immediately jump out:
No or limited loss of performance after a controller failure
All drives and cache can be used to service all workloads
Managing a single array significantly reduces cost and complexity
A better recipe for maintaining performance levels
The next question is; "Is there a suitable quad controller array that the customer could have used instead of the two dual controller arrays they have?" Yes, 3PAR's F400 or T400 arrays are both quad controller arrays. The disk drives in these arrays can be either SATA or FC, or a mix of both types if the customer wanted to implement tiering. Product information of the F400 can be found here, and the T400 here.
However, simply putting four controllers in an array does not necessarily guarantee that they will be able to sustain write caching if one of them fails. The array must have the ability to remap and re-mirror the write cache contents of all four controllers to the surviving controllers following the loss of a controller. It's an interesting geometric sort of problem: There are four controllers, each with their own cache and cache that is mirrored from the other controllers in the array. All cache contents, including mirrors, need to be distributed evenly across all controllers to avoid congestion and load imbalances. All cache content, including mirrors needs to be accounted for within the array so that if a controller fails, the other controllers will be able to identify all the surviving original and mirrored copies of data. For cache data that has lost either a primary or mirrored copy, a second (new) copy needs to be made. Finally, the amount of data in cache may need to be re-leveled (decreased) to fit into the degraded cache capacity (3 controllers instead of 4).
The software for doing this in a 3PAR array is Persistent Cache. Product information on Persistent Cache is here (PDF)
I made a 9 minute last year video describing how Persistent Cache works. Here it is again. Thanks for watching.
Technology integration makes computing products much easier to use and significantly drives down the cost and effort of owning it. For instance, technologies such as WiFi that were recently beyond the grasp of most people are now inexpensively integrated into PCs and usable by almost anyone.
The trick with integration is understanding what variables should be exposed - or as my friend Rick Vanover likes to say - how many knobs there are to turn. End user and infrastructure provider requirements differ considerably when it comes to knobs. For instance, Apple computers are great end user machines because they lack knobs, but are not always loved by technology professionals for the same reason. Data center operators need products with knobs in order to accommodate all the cross-purposed requirements that stretch beyond a one-size-fits-all design.
So knobs are generally good - but like so many things - their usefulness depends on how effective they are and their station in FARLEY'S HIERARCHY OF KNOBS, which includes the following levels:
Suicide Knobs: knobs that delete data and make things blow up. A good example of a Suicide Knob is something that formats storage.
Prison Knobs: knobs that make changes that are very difficult or impossible to reverse. Many storage provisioning knobs fall into this category. Once you provision and reserve storage with most storage arrays today you are stuck with that decision until the array's EOL.
Faux Knobs: knobs that never seem to do anything, no matter how far you turn them. For features past and future, but not now.
Random Knobs: knobs that produce unanticipated results that can go unnoticed for years. These are the knobs that fuel the technical publishing industry.
Slippery Slope Knobs: knobs that start you down a path to ruin through a chain of system dependencies. These are the knobs you spend a lot of money to learn about in vendor classes.
Dumb Ass Knobs: knobs that do things, but not anything useful. Granted there is a LOT of subjectivity in making a call on a dumb ass knob - but we all agree they exist.
Honest Knobs: knobs that actually do something you need them to without having to plan for weeks on how to use them. Most knobs should fall into this category, but alas!
Magic Knobs: knobs that do things so useful it makes you wonder how anybody thought of a knob like that. Most of these knobs are actually Honest Knobs, but we are so accustomed to seeing Suicide, Prison, Faux, Random, Slippery Slope and Dumb Ass knobs that we are blown away by a truly great Honest Knob.
Which brings me indirectly to the discussion of STACK WARS, which, have been centered around the announcement of VCE (VMware\Cisco\EMC) and their vBlock concept last November.
I'd like to say I was surprised yesterday when graphically-challenged Hitachi announced their intention to sell their own Unified Cloud Graphic, (complete with Hitachi compute servers!). But it wasn't a big shock considering their marketing strategy of "just copy it".
I really don't know how they expect their graphic to compete with vBlock's graphic, with all the color, multiple font sizes and graphics within graphics.
What's missing from the both stack graphics are the knobs that administrators use to get real work done. Yes, knobs tend to be part of the underlying details, but to anybody that actually uses a product, they are very important details. The detail that C-level executives need to understand is that the stack does not have nearly the automation that is being promised today and that administrators will be doing a lot of work, turning the knobs that the stack provides. Again, it's not the number of knobs that matters, as much as it is the quality of those knobs.
Some people have speculated that the vBlock was a knob-less invention that originated in the board rooms of the VCE companies. Some have even suggested that it was the fallout after a failed acquisition bid by Cisco to acquire EMC. I don't know if THAT's true, but there is some evidence that the engineering groups in the companies involved have been scrambling to put meat on the bone.
Maybe someday stacks will be the next big thing, but I don't see it playing out that way unless an awful lot changes in the underlying products that make up the stack. Here's my take on STACK WARS:
Bloggers that write about stacks have a chance of getting jobs with stack vendors. If you are out of a job, start a stack blog today and twitter your back-stack off!
Stacks are all about packaging. Stacks will be assembled and shipped together (presumably), which could
make things easier if your goal is to streamline receiving.
Stack products, are actually more services than products. However, if you ever want to make configuration changes in your stack it might not be economically feasible. (Think gigantic FRUs) For example, there is not a lot of flexibility in vBlock's configurations.
Due to the limited configuration options, stack resources are not likely to be used very efficiently and the economic return on the investment will lag. However, EMC customers are already accustomed to low storage utilization levels - so poor utilization might not be THAT big a deal. Definitely a weird way to win a point, but I 'll concede it grudgingly.
The business advantage of integration should be much lower costs. However, the VCE companies all need to maintain their margins if they want to satisfy investors. It's not clear how they will be able to leverage the integration effort to reduce the cost of vBlock, but then again if STACK WARS turn into PRICING WARS for STACKS, things could get very interesting. IBM must be STACKING up something - after all Hitachi already beat them to the punch.
The C level view of stacks are that they smooth out purchasing and operations expenses by providing a smaller number of Purchasing Knobs (that would be a Faux Knob). John Nash posted in his blog last week,"The Case for the vBlock":
What is interesting is that, usually, the higher up in an organization
you are communicating the better the Vblock conversation goes. Remove
the detailed technical questions and the value of the Vblock idea
really shines. You get a known “product” from trusted sources. You get
known costs today as well as known costs for future expansion. It
greatly removes the risk from the organization with unknown
infrastructure expenses.
There you have it, vBlocks will be sold from the top down by Cisco and EMC - companies that are good at selling from the top down, which will make it somewhat easier for the VCE companies to justify their price tag. But that won't make the price any easier to swallow.
As Nash wrote, "remove the detailed technical questions and the value of the Vblock idea
really shines." That's like saying chapulines (fried grasshoppers) might appeal if Anthony Bourdain is talking about them on TV, but your own personal experience chewing and swallowing them might be different. I'm not talking about price here, I'm referring to the experience of running the vBlock. There is going to be a lot more involved than the knob-less graphics portray.
The weakest link in the vBlock chain today is EMC's contribution. There are far too many Prison (provisioning) and Slippery Slope Knobs in EMC storage. They aren't the only vendor with this problem, but they are the E in VCE. Provisioning storage with a v-Max is the about the same as it was with a DMX - despite what EMC employees would have you believe.
Prison Knob provisioning creates a lot of problems for customers as storage ages and as demands shift. Once storage has been reserved for usage in an EMC system, it is pretty much bound to that purpose.
My advice is to buy the products with the most Magic Knobs and avoid those with the most Prison provisioning Knobs. If you have ever felt trapped by a storage configuration that you couldn't live with or afford, you know what I'm talking about. Magic Knobs are those that reduce the effort to manage and change storage, increase the efficiency of storage and provide the most versatility for all applications, workloads and multi-tenancy.
There has been some discussion recently about wide striping in the storage and virtualization communities and it struck me that a number of people were unsure what it was and how to think about it, so I thought I'd explain the basics here.
What is Wide Striping?
Wide striping is a method of spreading data over a large number of disk drives in a storage array in order to achieve a desired performance goal. The term wide striping has been used sometimes to refer to sets of disks with 16 to 32 members, or more. It seems strange that somebody would classify a 32 member disk set as having wide striping, when you compare that to a disk set with 320 members, but that's how definitions in the storage industry tend to be broadly interpreted - especially by vendors :).
The amount of work a disk drive does is measured in IOPS
One of the ways to measure storage performance is using a metric called IOPS - or IOs per second. An I/O is either an operation where the disk drive reads data from media or writes it to media. Reads process faster than writes, so it is important to understand this mix in your applications. The amount of work a disk drive can do is proportional to the rotational speed of the drive.
There are two basic classes of disk drives used in enterprise storage arrays today; lower-cost, lower performance SATA disk drives and higher-cost, higher performance disk drives with Fibre Channel (FC) or SAS interfaces.
A 7,200 RPM SATA drive can sustain ~ 80 IOPS
A 15,000 RPM FC/SAS drive can sustain ~ 200 IOPS
In other words, a 15,000 RPM SAS/FC drive can do approximately 2.5 times more IOPS than a 7,200 RPM SATA drive. You need approximately 5 SATA drives to sustain the same workload as a pair of FC/SAS drives running at maximum sustained levels.
Storage workloads are determined by the applications being used and there is a great deal of variety in the performance requirements they have. Some applications need very little storage performance and some applications need a high level of storage performance.
Workload isolation
Storage arrays that do not have wide striping tend to have high-priority workloads isolated on certain designated drives in order to meet the performance requirements of those applications. For instance, an array with 100 FC/SAS disk drives could have ten of those drives used exclusively by one high priority application to make sure that it has access to all of the 2,000 IOPS those drives can generate.
Isolating workloads this way creates storage capacity utilization problems because the capacity requirements of most high performance applications tend to need fewer disk drives than IOPS requirements determine. In other words, an application that has reserved the 2,000 IOPS of ten 450GB FC/SAS drives might only need 900GBs of capacity - with a utilization of 20%.
The IOPs requirements of applications change over time, which means an application that has it's workload isolated on a set number of drives can exceed their aggregate IOPS capabilities, creating a performance bottleneck for the application. The structure that was established to ensure sufficient IOPS becomes a performance problem. The process of adjusting and rebalancing the disk drives that are used to support isolated workloads can become extremely complicated and time consuming depending on many variables and creates opportunities for administrator errors.
Workload randomization
Alternatively, storage arrays with wide striping tend to have multiple applications sharing the IOPS generating capabilities of a large number of drives - up to all the drives in the system. Using this approach, the data from each applications is spread over all the drives being wide-striped to give all applications access to the the aggregate IOPS potential of the array. For example, an array with 100 FC/SAS disk drives that wide stripes data over all the drives would have 20,000 IOPS available for any and all applications at all times.
In contrast to workload isolation, discussed above, this approach is known as workload randomization because it mixes all the data in the array across all of the disk drives. In order to ensure data is written across all the disk drives, wide striping storage arrays tend to write data in very small increments on each drive. Applications with heavy IOPS workloads are utilizing all the drives in the system in relatively similar amounts.
By nature, wide striping eliminates array bottlenecks and achieves better storage utilization, but utilization levels depend on storage provisioning methods that are designed to take advantage of wide striping such as Thin Provisioning, which is another topic for another blog post.
The lack of bottlenecks in a wide striping storage array also means that mixed workload performance is optimized because all applications have access to the full IOPS potential of the array. Virtualization environments benefit in two ways: 1) by being able to mix applications and VMs on physical servers without worrying about the creation of storage bottlenecks and 2) achieving higher VM densities.
Enterprise arrays with native, wide striping
There are two enterprise storage arrays that use native wide striping across all the disk drives in the system, without the need for administrators to organize RAID groups and create disk pools.
3PAR InServ
IBM XIV
There are many important differences between these two products, including the flexibility of RAID implementations and disk drive options, but the method of wide striping is similar.
Wide striping IOPS to servers
The number of IOPS delivered by an array is slightly less than the aggregate number of drives in the array, depending on the mix of reads and writes and the type of RAID that is used by a particular volume. Here are a few examples of IOPS calculations for wide striping systems using different drive scenarios and 8k blocks. The arrays modeled here do not include mixing RAID types for different volumes, except the last model that mixes RAID 5 on FC drives with RAID 6 on SATA drives.
Array 1: 80 450GB FC disk drives: 80% reads, 20% writes
Total IOPS of all drives in the array: 16,000
IOPS delivered to servers w/RAID 5: 10,000
IOPS delivered to servers w/RAID 10: 13,333
Array 2: 200 1GB SATA disk drives: 80% reads, 20% writes
Total IOPS of all drives in the array: 16,000
IOPS delivered to servers w/RAID 5: 10,000
IOPS delivered to servers w/RAID 10: 13,333
IOPS delivered to servers w/RAID 6: 7,505
The next array has both FC and SATA drives and changes the read/write ratio. In this case there are two different wide stripe sets, one that spans FC drives and the other that spans the SATA drives.
Array 3: 160 450GB FC + 480 SATA disk drives: 70% reads, 30% writes
Total IOPS of all drives in the array: 70,400
IOPS delivered to servers w/RAID 5: 37,052
IOPS delivered to servers w/RAID 10: 54,153
IOPS delivered to servers w/RAID6 (SATA) & RAID5 (FC): 31075
About 10 years ago a small team that I was a part of looked at starting a company that would do something similar to what IBM's SVC does. The idea was to create a SAN front end controller with a lot of cache memory that would virtualize "downstream" storage and provide performance boosts through various techniques such as caching, striping, and multi-way mirroring. We gave up on the idea when it became apparent to us that the project was quite a bit larger than we initially thought and it was unclear when we would ever have sufficient resources to get a competitive product to market. I think we could have sold the idea to venture capital investors who were throwing money at storage startups, but we couldn't sell it to ourselves. For those of you that wonder why I tend to think SVC is an important product, that's why - I know some of the things IBM did to make it work and I admire their ability to bring it to market.
Anyway, one of the hurdles we couldn't get past was how to deal with mixed workloads originating from multiple SAN attached servers simultaneously streaming data from data warehouses and IOPS from transaction databases and all sorts of other bursty, unpredictable applications competing for memory and disk resources. As a front-end appliance, you can control your own cache resources, but you can't do much about the back end disks because downstream arrays mask them from the appliance. In fact, there was no way to predict the performance of the downstream arrays for any given workload. We consulted with experts from the industry and research universities and they were all discouraging about the ability to significantly improve the performance of mixed workloads in a shared SAN appliance.
There were two issues: downstream arrays already had their own caches and the dynamics of sharing cache resources in an appliance. If these fundamental problems could be solved, the rest of the work was the not so simple matter of making it work as a cluster with mirrored cache for HA.
Uncoordinated multi-level caches have the problem of redundant data. For example, pre-fetching data in the appliance's cache will likely end up loading the same data in the downstream array's cache. With duplicated cache data, a cache hit on the appliance won't be that much faster than a cache hit in the downstream array - and a cache miss in both will be much slower. It's difficult to prove the value of THAT. Its certainly possible to tune both caches differently, but this turns out to be easier said than done and tends to flatten the value of an "easy to use" appliance.
So, the way to overcome this is to make a HUGE cache in an appliance. Increasing a cache's size can do a lot for performance and Chuck said,
The exception, of course, is if you've got the bucks to create a
ginormous read cache, and pull almost all the significant data into
memory. Don't snicker -- there are a few use cases where this sort of
approach makes sense.
In other words, create a RAM disk in cache. This technique is normally reserved for a single, high profile application and as Chuck wrote, there are use cases where this makes sense. But it doesn't address the requirements of mixed workloads where there are a large number of applications that do not merit dedicated memory, but still need good performance. It might be possible to micro-manage cache for some number of applications by dedicating cache for each of them, but that requires a great deal of work that is likely going to be a temporary solution lasting a couple of months at most. It's probably a great way to drive storage admins crazy.
A more palatable approach is to use a global cache that shares cache resources among multiple servers and their applications. In some cases, it's possible to predict workload demands (end of month processing for example), but in many cases, the instantaneous performance requirements cannot be predicted because it is driven by spontaneous events. As many people are too familiar with, spikes in Internet activity and the corresponding bottlenecks in back end storage clearly elaborate the challenges of mixed workloads. Global caches that can accommodate large Internet traffic spikes are expensive and do not provide noticeable performance advantages most of the time - they are overkill.
The proliferation of virtualized servers has significantly increased the breadth of the
mixture that a storage array has to deal with. In general, there is more overall
I/O activity that is, for the most part, less predictable, and therefore
more difficult to deal with in cache.
The question is, is storage tiering with SSDs any better? Possibly. If it is going to more effective than caching, it needs to be able to provide more control points and intelligence than cache typically does. For instance, the ability to prioritize and schedule applications for movement into SSD tiers could be an important difference. 3PAR's QoS Gradient concept in Adaptive Optimization is an example of a simple prioritization scheme. The internal counterpart to QoS Gradients is internal monitoring of I/O levels which help determine which applications are promoted.
That's not to say caching can't have some of the same controls, but traditionally caching has been more reactive than proactive. To be clear, tiering is also reactive, but within the context of intelligent preparation and business-driven policies.
Still, considering the cost of SSDs, you have to ask the question is tiering overkill too? It might be. If the array can keep up most of the time, how much SSD capacity should be purchased? These are the sorts of things that will be determined over the next couple years.
The best technology to date for dealing with mixed workloads continues to be wide striping. If you throw out latency-sensitive applications from the mix - those are the same apps Chuck Hollis referred to when he talked about the applicability of "ginormous read caches" - then the array just has to provide adequate throughput at reasonable service times (latencies).
Wide striping does this by spreading the workload mix over as many disk drives as possible. Not the number of drives that fit in a shelf or can be added to a RAID group, but all the drives in an array, at best or all the drives of a particular class, day SATA or FC. Wide striping very thin layers of data across hundreds of drives means that hundreds of servers can be accessing data simultaneously, with a minimal amount of contention. The result is that all the drives are kept busy at the same rate and that none of them bear an unfair burden. The overall sustainable throughput is very high, scales by adding more drives and in general fits the profile of mixed workloads better than any affordable caching scheme does.
We've become accustomed to thinking that more memory is the answer to all storage performance problems, but that doesn't exhaust the potential of massed disk drives.
In a strange turn around of events, the 3D cartoon instantiation of storage anarchist was apprehended recently while sneaking around in 3PARvaTAR's chunklet matrix. Special cameo appearances are made by the Storage Architect, iKnerd and and Stephen Foskett direct from their karaoke concert last Thursday night @ #HPbladesday
In the last few weeks 3PAR released a major software upgrade for their
arrays which enables a boatload of new features, including enhanced thin provisioning, autonomic provisioning, and RAID-MP.
Today I performed the upgrade to Inform OS 2.3.1, and it went without a
hitch and was quite easy. Unlike high-end arrays from Hitachi, EMC, and
IBM, the upgrade process is not complex and took no more than an hour
of time.
The whole blog post is here: Thanks for the kind words, Derek!
Click on the image below to find out about our capacity guarantee program. The program EMC doesn't want you to check out.
As Mike Riley points out on his Netapptips blog today, I was not exactly a fan of Netapp's
guarantee program when it first came out, but now I am an unabashed
supporter. Sometimes other vendors come up with excellent ideas.
Yesterday HDS announced their capacity guarantee program and although it depends heavily on the capacity differences between RAID 1 and RAID 5 (which is a little cheesy), they offer a contract and appear to be ready to put back the program with more than a hand wave. That leaves HP, IBM and EMC (oh wait - and Oracle Sun) as the major storage players who aren't offering a capacity guarantee for customers making a technology refresh on storage.
The question is - are these programs just marketing ploys? Sure they are, just as any customer satisfaction guarantee for any product is a marketing ploy designed to hook customers - whether it's soap, kitchen knives, bass lures, vacuum cleaners, etc, but these ploys and the products behind them are targeted directly at data center operators that are tired of over-spending on storage. All are serious products from some of the biggest names in the storage industry and proven technology leaders. 3PAR's thin technologies (provisioning, conversion, persistence, reclamation and zero-detection) continue to lead our industry.
Depending on the applications and requirements of your data center some of these products and programs will be a better fit than others. All of them should save you money on capacity purchases, but there are other things to consider, such as required software and services. The details of each vendors' programs are different. That said, if you are offered a contract to reduce your storage capacity costs, that's pretty strong negotiating leverage with any other vendor - whether or not they offer a guarantee. And even if you are not ready to make a purchase now, you might want to know how these programs work so you can be better prepared when the time comes.
To find out more about 3PAR's capacity program, click the image below.
A long time favorite of mine, Chris Evans: aka the StorageArchitect, has a new post on thin storage. After he maps out the way disks are virtualized by thin provisioning systems, he writes about storage utilization before moving to open a brief discussion on storage reclamation, something he has been leading the charge on in the storage blogosphere over the last year or so.
He starts the discussion on reclamation by showing how file systems consume storage space and the typical inefficient utilization that follows. He ends the discussion by stating:
In a thin provisioned environment, storage would have been requested
only for the blocks with valid data and in this way, a LUN can be less than 100% allocated.
Some readers might be tempted to think that LUN allocation percentage and has an inverse relationship to disk utilization, - in other words, the lower the LUN allocation is, the higher the disk utilization is likely to be. While that may appear to be the case, there are too many variables involved to say there is a mathematical (linear or otherwise) or causal relationship between them. Filing software, such as file systems, databases and virtualization platforms, that allocate capacity from storage, tend to consume more storage over time as users and applications create new, incremental data and files. It follows that the percentage of a thinly provisioned LUN that is allocated is likely going to continue to increase over time. Think of it as storage entropy, which might be some sort of new digital law (the amount of consumed storage in the universe is always increasing) enforced by the extremely high probability that we will continue to create lots more data all the time.
And this is where the idea of 3PAR's thin reclamation, persistence and conversion technologies come into play. These technologies are designed to deal with the ongoing life cycle problems of storage entropy, such as deleted files and in a virtual systems world - deleted systems. But more on that in posts to come.
Returning to Chris' post, it's fitting that he chose a Sesame Street analogy, considering that Sesame Street celebrated it's 40th anniversary last week. He makes the point that storage can share capacity (cookies) among filing systems through reclamation and suggests four kinds of storage cookie monsters; greedy, selfish, nice and saintly.
As much as I love the Sesame Street gang (especially Grover), I'd suggest that the real Cookie Monsters are the filing systems (file systems, databases and virtual system platforms), and that the cookies are storage capacity. Filing systems have been historically bad at sharing the storage cookies they have been given and Symantec's Storage Foundation is truly a breakthrough product designed to share it's capacity cookies with any other filing system in the neighborhood. Saintly behavior, indeed!
But the filing systems are not completely to blame here because storage hasn't provided the means for sharing allocated capacity. As in any system, the hardware and software have to move in something resembling unison. The way storage cookies are shared is by taking capacity that was previously allocated to a filing system and returning it to some form of unallocated storage plate. Using the Sesame Street theme, capacity reclamation is the process of putting storage cookies back on the array's plate where they can be allocated all over again by the monsters in your digital neighborhood.
Did I forget to mention that I LOVE data storage! (munch munch munch munch....)
In today’s Register,
Chris Mellor wrote an intriguing piece about the trend in cloud computing and the wave of industry consolidation that is occurring. He posits that the two are linked by a broader consolidation wherein IT equipment purchases will be made by a much smaller number of service-provider customers that sell services to enterprise customers, as opposed to those enterprise customers running their data centers and making their own purchases today. Mr. Mellor suggests that this shift from enterprise to cloud computing is the driver
for industry consolidation and writes that service providers will “want to buy integrated and very
efficient data centre kit.” In other words, service providers will be inclined to buy vertically integrated solutions from a small set of vendors.
But that leaves the question as to how service providers will differentiate their services. A major component of a service provider’s business value is
the selection, integration, and organization of best-of-class
infrastructure, which allows them to create unique services, features and cost advantages. Given this, why would they want to limit themselves to single-vendor
solutions that are inhibited by their vendors' business models and weaknesses? If, as the cloud
computing trend suggests, service providers gain increased purchasing clout,
they are more likely to demand that IT vendors provide greater
interoperability and standards in order to allow them the greatest choice in
mixing best-in-class elements of the IT stack (storage, servers, hypervisors,
OS’s, applications, etc.).
Vendor consolidation may very well be motivated by the desire of large vendors to vertically integrate their businesses to take advantage of future cloud-driven customer consolidation. Whether or not this strategy eventually claims an advantage will only be decided years from now.
Geoff Hough, Senior Director of Business Strategy 3PAR
I couldn't help but wonder a bit this morning when I saw the WSJ story that Brocade was considering selling itself to the highest bidder. The article mentions that HP and Oracle would be potential acquirers, but the storage world is much more interesting and underhanded than THAT. If this story is true, there will likely be a great fight for Brocade as tech giants try to figure out whether it will impact their ability to make a buck 5 years from now.
Some of the giants appear to be going stack raving made, (Cisco, EMC, HP, Oracle) while others are content to watch this from the sidelines (IBM and Dell). The stack play is not aligned with the idea of open systems. Companies want to control customers by offering stacks of related products. IMHO, the stack play is good for vendors and bad for customers.
But it's the world we live in. The question this go -round is whether or not anybody will be able to sit out and watch Brocade get swallowed up. Here's my knee jerk breakdown of the pros and cons of the usual suspects making a move for Brocade:
HP: Brocade's business could pull up their existing SAN business, but its just as likely that their existing SAN business be a lodestone to a Brocade in HP expansion. The Procurve/Foundry mashup would claim severe casualties.
Oracle: I think this is the most ridiculous fantasy, but if it were to happen it would definitely answer the question about Oracle being in the HW business. Now imagine meetings with your Oracle sales rep.
EMC: Buy Brocade and kill it, make boyfriend Cisco ecstatically happy and piss off customers. Why not - nobody could stop it.
Cisco: No possible path to regulatory approval = see EMC above.
IBM: If IBM can't get this done, they really are just a services company.
Dell: It would finally get the networking company it needs to compete with Cisco, but imagine them trying to become an OEM vendor! Do you think people would mess with 'em? Really?
Microsoft: No overlap, great shock value and they don't want Oracle to have it.