« Game on! The rapid rise of bundled infrastructure products | Main | The razor's edge on vBlocks »

November 04, 2009

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

nate

Nothing wrong? I guess it depends on who you talk to. I had an interesting conversation with Xiotech whom of course has a tight relationship with Seagate. They raved about their high I/O and stuff and I asked them about SSD. They went on about SSD having hidden problems with regards to performance dropping like a rock once you get over a certain amount of capacity utilization. I recall seeing some bugs on Intel SSDs with this and it seems like a more generic issue with multiple SSD systems. Though of course may not impact everyone. Their angle at least was SSD wasn't as reliable as their disk systems(given that they can self repair), at least not yet. I'm sure it'll get better over time.

It is interesting the general perception(even with myself) that just because something has no moving parts doesn't mean it will have higher reliability, or even performance. You don't have to look too far, take most any SD or CF card, even with an ATA or SATA adapter they can be pretty slow, and can be unreliable as well. I'm sure the "EFDs" aren't nearly as bad, but the general perception is there, and I think will take some time to shake out. Intel recently announced new firmware for their SSDs only to yank it hours(?) later after massive problems were discovered.

My view is still that short of very niche applications EFD just doesn't make sense yet in an enterprise storage array for the simple reason that the enterprise storage arrays can't scale to the # of IOPS that EFDs can do. It would be pitiful to see a million dollars of controllers driving a couple of dozen 2.5" disks. Until we get an order of magnitude improvement in enterprise storage controller performance EFD I believe will be more a dream than reality. The VMware test earlier this year with EMC CX arrays was a great example, I forget offhand what specific arrays they used but I think they had only 10 EFDs per array.

The place where EFD does seem to make sense though is in a transparent acceleration tier in between the storage and the clients. Several companies seem to be coming out/come out with such products. No need to go buy a new storage array or new software or whatever, just get one
of these and it will handle it automatically, no configuration, no tuning, any storage vendor any array. On paper at least it sounds really nice, wonder how it will work out in practice, in theory at least these appliances can do hundreds of thousands of IOPS in a fraction of the cost of traditional enterprise storage systems.

The most interesting thing I learned from the xiotech talks though was their claim about going to the various vendors(3PAR included) documentation/best practices and finding what the vendors suggested as far as maximum usable capacity(after RAID). Their claim was their system could go to 100%. They claimed the next best was 3PAR at 83%, and it went downhill from there pretty fast. NetApp being among the bottom at under 50%. But that wasn't the most interesting part, the most interesting part was how low Compellent rated, also slightly under 50% which shocked me, given how interesting their technology sounds on paper anyways.

They have some interesting technology but they lost me when they said they can use excel to tie into their storage arrays, you don't go around promoting MS excel to a linux guy, that's just a death sentence :)

I love technology..

marc farley

Yeah, I love it too!

So I'm not involved with our qual process on SSDs here at 3PAR, but I do know that there are considerable quality differences with different devices and manufacturers. I think of it as being somewhat similar to tape drives for storage. 4mm and 8mm data drives were designed to use lower cost consumer decks and there were many, many problems. It might be unfair to characterize EFDs (enterprise versions of SSDs) as having the same quality problems as consumer flash cards or devices.

I'm not aware of the quality problems you are describing, but it would surprise me if there weren't problems of one sort or another. There are still quality problems with disk drives and that is after decades of refining the technology. As users, we tend to take this sort of thing for granted, but there will always be jobs for people doing device qualifications for system and storage vendors. And I tend to think of a disk drive as modern miracles in terms of failure rates per stored capacity.

Large capacity EFDs are still relatively new and the requirements differ a huge amount from consumer level flash products. The I/O rates expected are extremely high, something that there is not even a concept for in consumer flash. I do a lot of video editing (as you probably know) and I usually have to move my videos off the camera flash to work on them, because the performance of my editor is too slow reading directly off the device.

The mid tier acceleration appliance is an interesting concept, but I think it is mostly without merit for a number of reasons. The first thing you have to think through with any storage system are the failure modes - and I'm not bringing this up as FUD - working through failure mode conditions is a huge percentage of the engineering effort for any storage subsystem. You can always think of interesting applications, but its extremely difficult to make them bulletproof (something that never happens, as you know) - yet that is the goal. To be specific, the failure modes that are most difficult involve the interaction between the appliance and the array(s) it sits in front of. Transparency is problematic, removing or bypassing them, if necessary, is problematic and best of all - keeping up with the support matrix of host/storage I/O requirements is problematic. Believe me, its completely non-trivial and there is a reason why array vendors are not going to willingly support untested environments like this - and it has nothing whatsoever to do with market share.

IBM's SVC is a good example of an intermediary appliance that seems to work reasonably well and that's because IBM poured a ton of resources and pre-exisitng technology into it. Even so, it has a relatively small list of supported arrays and IBM can't really afford to test a much larger sampling. But I should really let somebody from IBM comment on that here - considering that 3PAR competes with SVC implementations from time to time.

Then there are the actual performance capability claims. Do these appliances use some sort of caching technique that is unavailable to the array manufacturer? That is very unlikely. Caching is something that has been developed for decades and been looked at every which way. An appliance could have an application orientation that could help it's performance, but then that makes it an application specific device with added cost and complexity - and then you have to scratch your head and wonder if its really worth it.

There have been several attempts over the years to create caching or security appliances for SANs - as well as NAS aggregation appliances and most have has very short life spans. That's not to say it couldn't be done, but I think they have to be targeted at very specific purposes, such as data migrations.

Ah yes, vendor utilization claims. You can't trust what any vendor says - especially about another vendor. I think the best place to get this sort of information is from the SPC benchmarks - which only tells a part of the story because all benchmarks are tuning exercises - but they do go into details about the utilization levels used to achieve the performance profile of the benchmark. I've looked at Xiotech's SPC-1 benchmark and calculating it the exact same way we do (that's where the 83% number came from that Xiotech referenced) - theirs was 78% - which is not so bad by the way, but certainly not 100% as they claim - and certainly not the 95% utilization level benchmarked with our F400. FWIW, the utilization calculation I use comes from doubling the ASU capacity (because it is mirrored) and dividing by the total physical capacity. There has been some debate over whether or not mirrored capacity should be counted as utilized, but it IS contributing to I/O performance, so saying it is not utilized is wrong. Its one of those things - you don't utilize the capacity of both disks, but you definitely utilize the performance (Definitions! sigh...) RAID 5 on the other hand has parity overhead that never contributes to I/O production.

Regardless, Netapp's utilization numbers aren't all that bad as I recall and Compellent hasn't even done an SPC-1 benchmark. Of course EMC doesn't do them either because the SPC-1 is designed to avoid cache and their disk layout tends to be constrained - despite what Barry Burke says. All wide striping is not equal.

Anyway, I think Xiotech is trying to spin something that is not even close to being an apples to apples comparison.

Enrico Signoretti

Marc,
I agree with you, EFDs are a "non implementation" of SSDs. EMC customers are not satisfied by the complexity and restrictions to implement SSDs on their DMXs: relayouts of DBs and filesystems are big pains for every one... especially for the EMC customers with "famous-as-difficult-to-use-management-tools".

Now customers know that FAST V1 will not solve the problems and V2 will arrive some time in 2010... why buy today something that you can buy tomorrow?

For example, IMHO, implementations (SSDs as a cache) from NetApp (PAM) and IBM (SVC) are better than EMCs one as a temporary solution waiting for next generation architectures designs with true SSDcapabilities.

But, you know, the best solution for SSDs now is the one that comes from Compellent, i'm seeing outstaning results with few SSDs and Dataprogression, ;-)

ciao,
Enrico

PS: BTW, storage utilization from Compellent is much higher than 50%.

John Martin

Marc,
nice post, with some interesting points, its the second time in as many days where I've heard of people reporting performance problems with the flash translation layers (FTL) in solid state disks as they near their maximum capacity. I suspect that this is probably due more to poor implementations in some disks FTL's rather than an inherent problem with SSD's.

From the performance characterisation tests I've seen with various SSD's, there are some big differences with different workloads between each of the SSD's, so broad statements that paint all SSD's with the same brush should probably be taken with a grain of salt ..

Flash will get used at a number of different layers in the storage infrastructure, but while it's so expensive (at least in relative terms), technologies that allow it to be leveraged by a large number of workloads should be the most succesful, however this is something EMC's approach manifestly failed to do.

Flash as read cache will probably become more and more prevalent, as prices come down, I wouldnt be surprised to see TB scale flash devices used as swapspace/pagefile drives in host servers (again shared amongst multiple workloads via the magic of server virtualisation). When this happens, I/O workload profiles to shared storage will become increasingly dominated by random writes, so architectures that optimise writes will have a natural advantage going forward.

On a final note, I'd like to make a small correction, PAM-II cards dont use SSD's (unless you define SSD = solid state devices), they use RAW flash on a PCIe card with our own flash translation layer. Its optimised for its purpose of an intelligent read cache. This is proving to be succesful exactly because unlike the static allocation techniques favoured by EMC it can be easily leveraged across multiple workloads.

On a second note, the "maximum usable capacity(after RAID) ...NetApp being among the bottom at under 50%" quote is WAY off. I'm attempting to cover this, and the thorny issue of fairly defining and comparing capacity efficiency on the netapp storage efficiency blog under a series of posts entitled "How to measure storage efficiency"

I'm late for my latest update on this, but I'll try to do my next post this weekend, I'd appreciate any thoughts/challenges etc you might have

Best Regards
John Martin
Consulting Systems Engineer
NetApp - ANZ

Bob

Nate,
Seagate has struggled to release SSDs. I think that's why Xiotech is saying what they're saying.

Enrico,
It's very obvious that you are in Compellent's camp. You may want to tone it back a little. Your bias is killing your credibility.

As for Data Progression being useful with SSDs, I don't see how. Doesn't it take something like 3 days (default) to detect hot data and then move it to SSD? SSD is usually targeted at low-latency, high IOPS applications. Those are usually database applications. So if, for example, a month-end process kicks off and the DB needs really high IO from some data that has been stale for the past month then the database will suffer until that data is moved to SSD. Since a "month-end process" typically runs in hours, not days, that data will never move to SSD. If it does it's unlikely the database will be able to take advantage of the SSD performance. Compellent needs get DP down to 2-3 seconds not 2-3 days. Since DP is a scheduled process that runs every 24 hours it can't detect hot data in under a day.

Or has something changed?

Mike Workman

Bob [regarding your comment to Enrico] -

Wow. I am thrilled to finally see some sanity in the rediculous hyperbole to the data progression story. Thank you.

There is one way to do exactly as you state - 2-3 seconds: Cache.

The reason CML *needs* data progression is because they have exactly 0.5GB of Write cache, and 3.5GB volatile cache (which is shared so it is not all cache). This rediculous amount means any performance from them better be coming from the disk, and we all know the disparity between RAM and disk.

And by the way - we all have cache (forgetting the puny amount CML has), so one can hardly feature Cache...

as they say, if you can't fix it feature it.


Mike

Enrico Signoretti

Bob,
I'm a Reseller of Compellent in Italy.
Data progression works very fine with SSDs we have already seen it in action (SSDs default for DP is 1 day, not 3).
SSDs are working well with data progression and default scheduling works pretty good for our customers. Furthermore you ignore some capabilities of the storage to force DP more often than default setting.

BTW, with 5 active 146GB SSDs you will have net writable data for 400-500GBs and if you have more than 500GBs moved a day in all your databases you can buy more disks!
In particular for DBs 500GBs are a lot because you will have transactions/redo logs heavily accessed (small files with sync IOs) and big Datafiles with different type of IOs... it's very easy to tune the Compellent well to manage this.

With 5 active disks (i.e.: much less than needed in Pillar array) you can have slightly more the iops of 150HDs and surely less latency, cache is not a problem with SSDs!

As regards 3.5GB of cache there are some things you need to know:
1) Compellent's cache has dynamic blocks (from 2KB to 256KB) so when you need a 4K block you will use 4K of cache (no partitioning or other strange works on cache). The cache is used in a better way than most of other systems!

If you compare 512MB from Compellent to traditional caches it is more than enough!
Normally cache is configured with one standard block (64K) so every time you get 8K you use 64K... not very efficient, :-)
it means that if you have 8GB of write cache you can use 128.000 blocks (8192/64=128), on the other hand Compellent will work with 8K blocks so you will get 64.000 blocks (512/8=64).

2) optional features don't use cache to work!
yes, because in traditional array from some other vendors cache is often used to store bitmaps or journaling informations for snapshots and replicas.... several times i have seen big caches cut down to just a few hundred MBs.

3) As you know, cache is important for sequential reads/writes only. if you have 100% random (i.e.: DBs) cache is useless!
Besides we already experienced very high throughputs from Compellent with seq. read/writes thanks to widestriping.

ciao,
Enrico

PS fo Mike Wokman (Pillar):
Mike,
hey, take your time to read once again my blog and try to answer all the questions:

Rob Peglar

Hi Marc,
First, thanks for blogging – always interesting reading. I thoroughly enjoy reading your stuff. I do want to clear up some misconceptions/misperceptions about us, though, most of which are contained in the commentary by Nate to your original post on November 4th.

First, about our claim of “100% utilization”. It’s true we do claim 100%, but not on the metric opined upon in the blog. What we do claim and is indeed true is this:
• Over the given usable surface – i.e. the actual surface available after sparing, after protecting - we can fill that surface to 100% full without reserved areas (a.k.a. holdbacks) and without degrading performance.

Many systems have specific reserved areas for snapshot and/or thin provisioning. Many systems also perform nicely when over-configured/short-stroked, as you know. However, when filled to 100% of their actual usable capacity, performance tends to degrade as long-stroking occurs as a natural consequence.

With RAGS, the patented Xiotech allocation and layout method, this degradation does not occur. Try it for yourself. Take an ISE and run SPC-1 with the ASUs allocated over the entire usable surface, but actual data only consuming 5% of the usable surface. Then, run it with ASU data consuming 100% of the usable surface. The IOPS will be the same in both cases.

In short, we don’t claim to be 100% efficient on your particular utilization metric – i.e. (ASU capacity *2)/total physical capacity. No one is 100% efficient on that mark. Your figure of 78% for Xiotech is in the ballpark - the actual figure is 2147.4/2920.2 = 73.5, but hey, if you want to give me credit for 78%, go right ahead :-)

I certainly agree with you that protection capacity should be counted; running production apps or SPC-1 without protecting data is an oxymoron, like RAID-0. As for metrics, consider this. The metrics we track, apart from the figure above (% of usable space effective w/o degradation or reserve), are the following two:
a) ASU effectivity, which is (ASU capacity / Addressable capacity)
b) Overall utilization, which is (Configured capacity / Physical capacity)
For the record, Xiotech (Hurricane 15K) is 100% on ASU effectivity and 96.69% on overall utilization. From what I read on the T800 6-node, the figures respectively are 94.37% (77824.0/82463.7) and 94.04% (187260.57/187924.14), respectively.

Please correct me if I’m wrong, but I believe I have the correct figures from your SPC-1 finding. I am curious why you hold back about 4.5 TB from the ASU capacity versus addressable – a bit odd. Having said that, yours is better than Sun (several arrays in the 87-92% range) and a heckuva lot better than the HDS USP-V (a staggeringly low 79%, the lowest of any vendor reporting). Then, there are the guys who just plain short-stroke because they can, like Pillar (overall utilization metric of 59%) and Fujitsu (worst of all, 51%)

Finally, the rubber meets the road on IOPS/disk. Our figure is 348.1 (@ $3.05/IOP) and yours is 175.8 (224989.65/1280) @ $9.30/IOP. We are using the same disk, the Seagate Hurricane 146GB/15K drive. So, this is indeed apples-to-apples; at the end of the day, the data ends up on the same technology. How it gets there is the question.

This is not a knock on what you guys do – we like wide striping (started doing it back in 1998, BTW) and we definitely agree that not all wide striping is equal. We actually agree on a ton of things, like publishing SPC-1. My hat is off to you guys for doing so. I wish all vendors would do so, quite frankly – those who do not publish SPC figures are open to getting ‘called out’, in my opinion.

The comments to this entry are closed.

Search StorageRap


Subscribe

Latest tweets

3PARTV

  • Loading...

Blogroll

Infosmack Podcasts

Virtumania Podcasts

Subscribe