I got all excited a few days ago when IBM announced their latest SPC-1 benchmark using thin provisioning, what IBM calls Space Efficient Virtual Volumes. In his blog, Barry Whyte from chides 3PAR for not having done a TP benchmark, referring to 3PAR as the "grandfather of thin provisioning". I didn't mind the ribbing too much because I thought it was cool that somebody had figured out how to do the benchmark and that the numbers looked so good.
Then the letdown - the benchmark didn't actually use thin provisioning functionality. TP was active, but it didn't have any work to perform, which means it didn't impact the benchmark. Arrghhhh!! No great story here, just more weird science with an IBM benchmark. A realistic TP benchmark should show really great disk utilization, but IBM's only had 54% disk utilization (mirrored). What's THAT?? Misleading.
Our last benchmark ran at 83% disk utilization and did not use TP.
Could you show us the math on the 3PAR SPC-1 results that gets you to 83% utilization? The best I can calculate from the posted results is just a bit more that 44.5% utilization of total capacity, since HALF of the total raw capacity is consumed by the mirrors.
I just can't seem to make your math work...or see how the 3PAR utilization is significantly different than the DS5300's (both are mirrored).
Posted by: the storage anarchist | October 20, 2008 at 04:50 AM
Hi Anarchist,
Alex McDonald and I spent a fair amount of time going back and forth on this one a few weeks ago on his blog, here:
http://blogs.netapp.com/shadeofblue/2008/09/3par-and-bistro.html
But I'll give you the short version. 83% (rounded down, I think) was employed and actively being read from and written to during our benchmark. Yes, it was mirrored, but that doesn't mean the disk wasn't being used for data I/O work. Finally we agreed that the term "utilization" was a problem - so I've tried to be careful ever since to say "disk utilization". A word like yield would be much better suited as a way to express the ratio of usable space to raw. The imprecision of the storage lexicon has always made things more difficult than they need to be.
In IBM's most recent SVC benchmark, the disks were 54% utilized, which means of course that 46% of the usable space was not used by the benchmark. Using the term yield, theirs was 27%.
Posted by: marc farley | October 20, 2008 at 07:43 AM
Silly me.
I'm in the "83% utilization means 83TB of unique data stored in 100TB of Raw" camp.
And I'm also of the opinion that you and your 3PAR marketing team are intentionally trying to obfuscate the meaning of the term "disk utilization."
As a vendor who has been known as the last bastion of "mirror everything," I'm quite familiar with the notion you are trying to promote - Symm has in fact been "utilizing" multiple mirrors to accellerate reads for years (up to FOUR mirrors for each LUN can be configured).
In fact, we might have a patent on that somewhere around here...I'll have to check :*)
Posted by: the storage anarchist | October 20, 2008 at 01:18 PM
Anarchist, your one of my favorite storage bloggers, so I won't concur (even tongue in cheek) with your "silly me".
The transformation of capacity into IOPs through mirroring is very useful and very real, but I don't need to tell you this because you've known it for a long time already.
We aren't trying to obfuscate anything. Any discussion about working capacity where IOPs are concerned should include all the capacity that is being used to generate those IOPs - not half of it.
The problem is that common vernacular doesn't always work very well - and often ages poorly. One of the biggest problems in communicating about storage is trying to find terms that are not ambiguous. Utilization" is unfortunately one of those misbegotten words that is intuitively misleading. We should be able to do better than that.
Posted by: marc farley | October 20, 2008 at 03:11 PM
OK, so by comparison our previous SPC-1 (non SEV) was using a similar "utilisation" in your terms.
However, I still stand by my comment, come and show everyone what 3Par in Thin Provisioned form can do, when compared with your "83%" non TP benchmark.
Then we can re-open this debate.
Posted by: Barry Whyte | October 21, 2008 at 03:43 PM
I have to disagree. NetApp do benchmarks with thin provisioning -- and snapshots -- both on. There's no difference with a NetApp FlexVol.
http://blogs.netapp.com/shadeofblue/2008/10/benchmarking-th.html
Posted by: Alex McDonald | October 21, 2008 at 03:56 PM
NetApp is a bit different, in that by its very nature its an LSA, so there is no overhead when doing a TP LSA (you could say the overhead is already built into the base)
Posted by: Barry Whyte | October 22, 2008 at 02:21 AM
@Barry
You were right first time, no overhead as the SPC benchmarks show clearly.
It's everyone else that has overheads when doing thin provisioning. Or snapshotting, which seems to be the big killer for most solutions. And, even more impressive (if you'll allow me a little hyperbole here) is that both are benchmarked with dual parity RAID. I'd love to see an equivalent benchmark (TP+SS+RAID6) from anyone else.
Posted by: Alex McDonald | October 22, 2008 at 08:05 AM
Thanks for your comments, Barry and Alex.
I don't want to speak for the SPC, but its my understanding that they are considering adding a TP benchmark, considering that so many vendors have TP. The problem is defining what that benchmark would be and what a valid setup scenario is.
Posted by: marc farley | October 22, 2008 at 01:01 PM
And then the SPC get to solve the problem of how to describe "disk utilisation". Vote for the ELF! http://blogs.netapp.com/shadeofblue/2008/09/elf-wealth-and.html
Posted by: Alex McDonald | October 23, 2008 at 01:58 AM