« 3PAR Rapper, 3P , said to be seething over Netapp rap video | Main | 3P releases smack up of Netapp »

March 11, 2009

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e553e34fa48833011168d344d3970c

Listed below are links to weblogs that reference De-Dupe soup tasting a little sour:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

nate

Before really understanding the technology behind dedupe and what it can/can't do we tested out some data domain gear in the hopes it could reduce our primary storage needs for our storage purchase recently. We fed it uncompressed feeds of the sample data that we wanted to see if it could de-dupe, about 300GB of data in all, it didn't work out the way we were hoping, it turns out just compressing it gave us better results so we ended up not going the de-dupe route(were originally looking to data domain's high end ~$100k SAN-attached boxes).

They confirmed fairly late in the eval that our data wasn't a good candidate for de-dupe.

Due to the somewhat static nature of the bulk of our data(store for X days and delete the oldest day each day, data is used the first day and only really gone back to for backup purposes or testing purposes beyond the first day), we opted for an entirely SATA based storage system which gave us the raw storage to house all of this stuff, but at the same time because much of the data(currently 50%) of it is this data that isn't accessed much after the first day, we can live with much higher utilization rates on the SATA disks with everything else sharing them. We probably could not run full SATA if we had really high utilization rates for the majority of data on the system, at least not without a lot more spindles.

We thought about getting tier 3 or similar storage to put this data on but the math in the end just didn't work out, if we did that then we wouldn't need as many spindles on the main array, resulting in us having to go to FC(for performance) instead of SATA which drove up the cost per TB by about 2x(based on list pricing for each solution)

Going with SATA from the start gave us enough spindles for a good baseline level of performance and with linear scalability as we add more disks we get good benefits there. We probably have more raw space than we really need, but that just means we can keep things like snapshots around for longer periods of time.

With a FC array I think we wouldn't be able to drive enough I/O to justify the price after a certain point. Of course you can mix/match SATA and FC but then you have to balance what workload goes where, and coming from another storage array that had at least 4 different pools/tiers of storage that wasn't a easy task. And determining the workloads of various tasks using shared resources(e.g. large NFS data stores) can be difficult/complicated as well, splitting everything out has it's own complications as far as managing space etc.

It wasn't an easy task to determine what the configuration of the new system should be, took a lot of work! I'm happy with the results myself.

Looking forward to the next rev Inform OS myself, want that thin built in turned on!

W. Curtis Preston

Hey, Mark! It's Curtis!

I still haven't gotten around to posting that post that I referred to above, but let me give you a hint. I am trying to form a company that would do such testing. Think Consumer Reports for IT.

We're not talking benchmarks, like the ones that the industry fights over. We're talking about tests in real-world environments with real-world data and people.

Anyway, I'm crawling back to to my hole. BTW, I fixed the typo in the original comment. Can you fix it in your quote of said comment? It's "...how MUCH each of you are exaggerating..." That'll be great. (Office Space manager voice there.)

The comments to this entry are closed.

Search StorageRap


Subscribe

Latest tweets

3PARTV

  • Loading...

Blogroll

Infosmack Podcasts

Virtumania Podcasts

Subscribe