Its all over the news today. STEC's stock is getting hammered because their largest customer, EMC, is delaying orders. I feel for the folks at STEC, its difficult being a supplier to industry behemoths like EMC. When realities don't meet expectations, things can crater in a hurry.
There is nothing wrong with flash SSD technology, it's simply a matter that the market demand hasn't ramped yet. Some of the EMC bloggers have chided me for my position on SSDs, but today's news pretty clearly vindicates what I've been saying all along:
- Flash SSDs are still considered expensive and are difficult to justify
- Customers can't use them intelligently or flexibly yet
- The market for customers needing extreme low latency, is not ramping that quickly
The situation is changing gradually. Prices for SSDs are coming down, but there still needs to be a lot of work done by storage system vendors to flexibly use them - and this is going to continue to take time. EMC has said they will release their first version of FAST this year, but almost everybody is looking at future versions of FAST to get the sort of functionality they can actually use. Compellent's Data Progression software has the basic ability to move data on and off SSDs, but the number of SSDs that can be used per system is small and their sales are far too small to make up for what EMC has failed to deliver on. SSDs have a bright future, its just going to take longer to flourish.
So where is 3PAR on SSDs? We think they will be an important technology in the future and we are working on integrating them into our products in a way that will allow customers to take advanatge of their capabilities. In other words, we're going to have them, but we're not going to sell something that costs a lot of money if our customers can't leverage them.
In the meantime we continue to sell the most efficient storage systems that also happen to have the best implementation of wide striping on the planet, which delivers optimal mixed workload performance for our customers.
Nothing wrong? I guess it depends on who you talk to. I had an interesting conversation with Xiotech whom of course has a tight relationship with Seagate. They raved about their high I/O and stuff and I asked them about SSD. They went on about SSD having hidden problems with regards to performance dropping like a rock once you get over a certain amount of capacity utilization. I recall seeing some bugs on Intel SSDs with this and it seems like a more generic issue with multiple SSD systems. Though of course may not impact everyone. Their angle at least was SSD wasn't as reliable as their disk systems(given that they can self repair), at least not yet. I'm sure it'll get better over time.
It is interesting the general perception(even with myself) that just because something has no moving parts doesn't mean it will have higher reliability, or even performance. You don't have to look too far, take most any SD or CF card, even with an ATA or SATA adapter they can be pretty slow, and can be unreliable as well. I'm sure the "EFDs" aren't nearly as bad, but the general perception is there, and I think will take some time to shake out. Intel recently announced new firmware for their SSDs only to yank it hours(?) later after massive problems were discovered.
My view is still that short of very niche applications EFD just doesn't make sense yet in an enterprise storage array for the simple reason that the enterprise storage arrays can't scale to the # of IOPS that EFDs can do. It would be pitiful to see a million dollars of controllers driving a couple of dozen 2.5" disks. Until we get an order of magnitude improvement in enterprise storage controller performance EFD I believe will be more a dream than reality. The VMware test earlier this year with EMC CX arrays was a great example, I forget offhand what specific arrays they used but I think they had only 10 EFDs per array.
The place where EFD does seem to make sense though is in a transparent acceleration tier in between the storage and the clients. Several companies seem to be coming out/come out with such products. No need to go buy a new storage array or new software or whatever, just get one
of these and it will handle it automatically, no configuration, no tuning, any storage vendor any array. On paper at least it sounds really nice, wonder how it will work out in practice, in theory at least these appliances can do hundreds of thousands of IOPS in a fraction of the cost of traditional enterprise storage systems.
The most interesting thing I learned from the xiotech talks though was their claim about going to the various vendors(3PAR included) documentation/best practices and finding what the vendors suggested as far as maximum usable capacity(after RAID). Their claim was their system could go to 100%. They claimed the next best was 3PAR at 83%, and it went downhill from there pretty fast. NetApp being among the bottom at under 50%. But that wasn't the most interesting part, the most interesting part was how low Compellent rated, also slightly under 50% which shocked me, given how interesting their technology sounds on paper anyways.
They have some interesting technology but they lost me when they said they can use excel to tie into their storage arrays, you don't go around promoting MS excel to a linux guy, that's just a death sentence :)
I love technology..
Posted by: nate | November 04, 2009 at 09:20 PM
Yeah, I love it too!
So I'm not involved with our qual process on SSDs here at 3PAR, but I do know that there are considerable quality differences with different devices and manufacturers. I think of it as being somewhat similar to tape drives for storage. 4mm and 8mm data drives were designed to use lower cost consumer decks and there were many, many problems. It might be unfair to characterize EFDs (enterprise versions of SSDs) as having the same quality problems as consumer flash cards or devices.
I'm not aware of the quality problems you are describing, but it would surprise me if there weren't problems of one sort or another. There are still quality problems with disk drives and that is after decades of refining the technology. As users, we tend to take this sort of thing for granted, but there will always be jobs for people doing device qualifications for system and storage vendors. And I tend to think of a disk drive as modern miracles in terms of failure rates per stored capacity.
Large capacity EFDs are still relatively new and the requirements differ a huge amount from consumer level flash products. The I/O rates expected are extremely high, something that there is not even a concept for in consumer flash. I do a lot of video editing (as you probably know) and I usually have to move my videos off the camera flash to work on them, because the performance of my editor is too slow reading directly off the device.
The mid tier acceleration appliance is an interesting concept, but I think it is mostly without merit for a number of reasons. The first thing you have to think through with any storage system are the failure modes - and I'm not bringing this up as FUD - working through failure mode conditions is a huge percentage of the engineering effort for any storage subsystem. You can always think of interesting applications, but its extremely difficult to make them bulletproof (something that never happens, as you know) - yet that is the goal. To be specific, the failure modes that are most difficult involve the interaction between the appliance and the array(s) it sits in front of. Transparency is problematic, removing or bypassing them, if necessary, is problematic and best of all - keeping up with the support matrix of host/storage I/O requirements is problematic. Believe me, its completely non-trivial and there is a reason why array vendors are not going to willingly support untested environments like this - and it has nothing whatsoever to do with market share.
IBM's SVC is a good example of an intermediary appliance that seems to work reasonably well and that's because IBM poured a ton of resources and pre-exisitng technology into it. Even so, it has a relatively small list of supported arrays and IBM can't really afford to test a much larger sampling. But I should really let somebody from IBM comment on that here - considering that 3PAR competes with SVC implementations from time to time.
Then there are the actual performance capability claims. Do these appliances use some sort of caching technique that is unavailable to the array manufacturer? That is very unlikely. Caching is something that has been developed for decades and been looked at every which way. An appliance could have an application orientation that could help it's performance, but then that makes it an application specific device with added cost and complexity - and then you have to scratch your head and wonder if its really worth it.
There have been several attempts over the years to create caching or security appliances for SANs - as well as NAS aggregation appliances and most have has very short life spans. That's not to say it couldn't be done, but I think they have to be targeted at very specific purposes, such as data migrations.
Ah yes, vendor utilization claims. You can't trust what any vendor says - especially about another vendor. I think the best place to get this sort of information is from the SPC benchmarks - which only tells a part of the story because all benchmarks are tuning exercises - but they do go into details about the utilization levels used to achieve the performance profile of the benchmark. I've looked at Xiotech's SPC-1 benchmark and calculating it the exact same way we do (that's where the 83% number came from that Xiotech referenced) - theirs was 78% - which is not so bad by the way, but certainly not 100% as they claim - and certainly not the 95% utilization level benchmarked with our F400. FWIW, the utilization calculation I use comes from doubling the ASU capacity (because it is mirrored) and dividing by the total physical capacity. There has been some debate over whether or not mirrored capacity should be counted as utilized, but it IS contributing to I/O performance, so saying it is not utilized is wrong. Its one of those things - you don't utilize the capacity of both disks, but you definitely utilize the performance (Definitions! sigh...) RAID 5 on the other hand has parity overhead that never contributes to I/O production.
Regardless, Netapp's utilization numbers aren't all that bad as I recall and Compellent hasn't even done an SPC-1 benchmark. Of course EMC doesn't do them either because the SPC-1 is designed to avoid cache and their disk layout tends to be constrained - despite what Barry Burke says. All wide striping is not equal.
Anyway, I think Xiotech is trying to spin something that is not even close to being an apples to apples comparison.
Posted by: marc farley | November 05, 2009 at 01:28 AM
Marc,
I agree with you, EFDs are a "non implementation" of SSDs. EMC customers are not satisfied by the complexity and restrictions to implement SSDs on their DMXs: relayouts of DBs and filesystems are big pains for every one... especially for the EMC customers with "famous-as-difficult-to-use-management-tools".
Now customers know that FAST V1 will not solve the problems and V2 will arrive some time in 2010... why buy today something that you can buy tomorrow?
For example, IMHO, implementations (SSDs as a cache) from NetApp (PAM) and IBM (SVC) are better than EMCs one as a temporary solution waiting for next generation architectures designs with true SSDcapabilities.
But, you know, the best solution for SSDs now is the one that comes from Compellent, i'm seeing outstaning results with few SSDs and Dataprogression, ;-)
ciao,
Enrico
PS: BTW, storage utilization from Compellent is much higher than 50%.
Posted by: Enrico Signoretti | November 05, 2009 at 02:40 AM
Marc,
nice post, with some interesting points, its the second time in as many days where I've heard of people reporting performance problems with the flash translation layers (FTL) in solid state disks as they near their maximum capacity. I suspect that this is probably due more to poor implementations in some disks FTL's rather than an inherent problem with SSD's.
From the performance characterisation tests I've seen with various SSD's, there are some big differences with different workloads between each of the SSD's, so broad statements that paint all SSD's with the same brush should probably be taken with a grain of salt ..
Flash will get used at a number of different layers in the storage infrastructure, but while it's so expensive (at least in relative terms), technologies that allow it to be leveraged by a large number of workloads should be the most succesful, however this is something EMC's approach manifestly failed to do.
Flash as read cache will probably become more and more prevalent, as prices come down, I wouldnt be surprised to see TB scale flash devices used as swapspace/pagefile drives in host servers (again shared amongst multiple workloads via the magic of server virtualisation). When this happens, I/O workload profiles to shared storage will become increasingly dominated by random writes, so architectures that optimise writes will have a natural advantage going forward.
On a final note, I'd like to make a small correction, PAM-II cards dont use SSD's (unless you define SSD = solid state devices), they use RAW flash on a PCIe card with our own flash translation layer. Its optimised for its purpose of an intelligent read cache. This is proving to be succesful exactly because unlike the static allocation techniques favoured by EMC it can be easily leveraged across multiple workloads.
On a second note, the "maximum usable capacity(after RAID) ...NetApp being among the bottom at under 50%" quote is WAY off. I'm attempting to cover this, and the thorny issue of fairly defining and comparing capacity efficiency on the netapp storage efficiency blog under a series of posts entitled "How to measure storage efficiency"
I'm late for my latest update on this, but I'll try to do my next post this weekend, I'd appreciate any thoughts/challenges etc you might have
Best Regards
John Martin
Consulting Systems Engineer
NetApp - ANZ
Posted by: John Martin | November 05, 2009 at 11:39 PM
Nate,
Seagate has struggled to release SSDs. I think that's why Xiotech is saying what they're saying.
Enrico,
It's very obvious that you are in Compellent's camp. You may want to tone it back a little. Your bias is killing your credibility.
As for Data Progression being useful with SSDs, I don't see how. Doesn't it take something like 3 days (default) to detect hot data and then move it to SSD? SSD is usually targeted at low-latency, high IOPS applications. Those are usually database applications. So if, for example, a month-end process kicks off and the DB needs really high IO from some data that has been stale for the past month then the database will suffer until that data is moved to SSD. Since a "month-end process" typically runs in hours, not days, that data will never move to SSD. If it does it's unlikely the database will be able to take advantage of the SSD performance. Compellent needs get DP down to 2-3 seconds not 2-3 days. Since DP is a scheduled process that runs every 24 hours it can't detect hot data in under a day.
Or has something changed?
Posted by: Bob | November 06, 2009 at 06:14 AM
Bob [regarding your comment to Enrico] -
Wow. I am thrilled to finally see some sanity in the rediculous hyperbole to the data progression story. Thank you.
There is one way to do exactly as you state - 2-3 seconds: Cache.
The reason CML *needs* data progression is because they have exactly 0.5GB of Write cache, and 3.5GB volatile cache (which is shared so it is not all cache). This rediculous amount means any performance from them better be coming from the disk, and we all know the disparity between RAM and disk.
And by the way - we all have cache (forgetting the puny amount CML has), so one can hardly feature Cache...
as they say, if you can't fix it feature it.
Mike
Posted by: Mike Workman | November 07, 2009 at 08:15 AM
Bob,
I'm a Reseller of Compellent in Italy.
Data progression works very fine with SSDs we have already seen it in action (SSDs default for DP is 1 day, not 3).
SSDs are working well with data progression and default scheduling works pretty good for our customers. Furthermore you ignore some capabilities of the storage to force DP more often than default setting.
BTW, with 5 active 146GB SSDs you will have net writable data for 400-500GBs and if you have more than 500GBs moved a day in all your databases you can buy more disks!
In particular for DBs 500GBs are a lot because you will have transactions/redo logs heavily accessed (small files with sync IOs) and big Datafiles with different type of IOs... it's very easy to tune the Compellent well to manage this.
With 5 active disks (i.e.: much less than needed in Pillar array) you can have slightly more the iops of 150HDs and surely less latency, cache is not a problem with SSDs!
As regards 3.5GB of cache there are some things you need to know:
1) Compellent's cache has dynamic blocks (from 2KB to 256KB) so when you need a 4K block you will use 4K of cache (no partitioning or other strange works on cache). The cache is used in a better way than most of other systems!
If you compare 512MB from Compellent to traditional caches it is more than enough!
Normally cache is configured with one standard block (64K) so every time you get 8K you use 64K... not very efficient, :-)
it means that if you have 8GB of write cache you can use 128.000 blocks (8192/64=128), on the other hand Compellent will work with 8K blocks so you will get 64.000 blocks (512/8=64).
2) optional features don't use cache to work!
yes, because in traditional array from some other vendors cache is often used to store bitmaps or journaling informations for snapshots and replicas.... several times i have seen big caches cut down to just a few hundred MBs.
3) As you know, cache is important for sequential reads/writes only. if you have 100% random (i.e.: DBs) cache is useless!
Besides we already experienced very high throughputs from Compellent with seq. read/writes thanks to widestriping.
ciao,
Enrico
PS fo Mike Wokman (Pillar):
Mike,
hey, take your time to read once again my blog and try to answer all the questions:
Posted by: Enrico Signoretti | November 12, 2009 at 06:43 AM
Hi Marc,
First, thanks for blogging – always interesting reading. I thoroughly enjoy reading your stuff. I do want to clear up some misconceptions/misperceptions about us, though, most of which are contained in the commentary by Nate to your original post on November 4th.
First, about our claim of “100% utilization”. It’s true we do claim 100%, but not on the metric opined upon in the blog. What we do claim and is indeed true is this:
• Over the given usable surface – i.e. the actual surface available after sparing, after protecting - we can fill that surface to 100% full without reserved areas (a.k.a. holdbacks) and without degrading performance.
Many systems have specific reserved areas for snapshot and/or thin provisioning. Many systems also perform nicely when over-configured/short-stroked, as you know. However, when filled to 100% of their actual usable capacity, performance tends to degrade as long-stroking occurs as a natural consequence.
With RAGS, the patented Xiotech allocation and layout method, this degradation does not occur. Try it for yourself. Take an ISE and run SPC-1 with the ASUs allocated over the entire usable surface, but actual data only consuming 5% of the usable surface. Then, run it with ASU data consuming 100% of the usable surface. The IOPS will be the same in both cases.
In short, we don’t claim to be 100% efficient on your particular utilization metric – i.e. (ASU capacity *2)/total physical capacity. No one is 100% efficient on that mark. Your figure of 78% for Xiotech is in the ballpark - the actual figure is 2147.4/2920.2 = 73.5, but hey, if you want to give me credit for 78%, go right ahead :-)
I certainly agree with you that protection capacity should be counted; running production apps or SPC-1 without protecting data is an oxymoron, like RAID-0. As for metrics, consider this. The metrics we track, apart from the figure above (% of usable space effective w/o degradation or reserve), are the following two:
a) ASU effectivity, which is (ASU capacity / Addressable capacity)
b) Overall utilization, which is (Configured capacity / Physical capacity)
For the record, Xiotech (Hurricane 15K) is 100% on ASU effectivity and 96.69% on overall utilization. From what I read on the T800 6-node, the figures respectively are 94.37% (77824.0/82463.7) and 94.04% (187260.57/187924.14), respectively.
Please correct me if I’m wrong, but I believe I have the correct figures from your SPC-1 finding. I am curious why you hold back about 4.5 TB from the ASU capacity versus addressable – a bit odd. Having said that, yours is better than Sun (several arrays in the 87-92% range) and a heckuva lot better than the HDS USP-V (a staggeringly low 79%, the lowest of any vendor reporting). Then, there are the guys who just plain short-stroke because they can, like Pillar (overall utilization metric of 59%) and Fujitsu (worst of all, 51%)
Finally, the rubber meets the road on IOPS/disk. Our figure is 348.1 (@ $3.05/IOP) and yours is 175.8 (224989.65/1280) @ $9.30/IOP. We are using the same disk, the Seagate Hurricane 146GB/15K drive. So, this is indeed apples-to-apples; at the end of the day, the data ends up on the same technology. How it gets there is the question.
This is not a knock on what you guys do – we like wide striping (started doing it back in 1998, BTW) and we definitely agree that not all wide striping is equal. We actually agree on a ton of things, like publishing SPC-1. My hat is off to you guys for doing so. I wish all vendors would do so, quite frankly – those who do not publish SPC figures are open to getting ‘called out’, in my opinion.
Posted by: Rob Peglar | November 23, 2009 at 12:51 PM