Chuck Hollis had an excellent post last week, discussing caching.
About 10 years ago a small team that I was a part of looked at starting a company that would do something similar to what IBM's SVC does. The idea was to create a SAN front end controller with a lot of cache memory that would virtualize "downstream" storage and provide performance boosts through various techniques such as caching, striping, and multi-way mirroring. We gave up on the idea when it became apparent to us that the project was quite a bit larger than we initially thought and it was unclear when we would ever have sufficient resources to get a competitive product to market. I think we could have sold the idea to venture capital investors who were throwing money at storage startups, but we couldn't sell it to ourselves. For those of you that wonder why I tend to think SVC is an important product, that's why - I know some of the things IBM did to make it work and I admire their ability to bring it to market.
Anyway, one of the hurdles we couldn't get past was how to deal with mixed workloads originating from multiple SAN attached servers simultaneously streaming data from data warehouses and IOPS from transaction databases and all sorts of other bursty, unpredictable applications competing for memory and disk resources. As a front-end appliance, you can control your own cache resources, but you can't do much about the back end disks because downstream arrays mask them from the appliance. In fact, there was no way to predict the performance of the downstream arrays for any given workload. We consulted with experts from the industry and research universities and they were all discouraging about the ability to significantly improve the performance of mixed workloads in a shared SAN appliance.
There were two issues: downstream arrays already had their own caches and the dynamics of sharing cache resources in an appliance. If these fundamental problems could be solved, the rest of the work was the not so simple matter of making it work as a cluster with mirrored cache for HA.
Uncoordinated multi-level caches have the problem of redundant data. For example, pre-fetching data in the appliance's cache will likely end up loading the same data in the downstream array's cache. With duplicated cache data, a cache hit on the appliance won't be that much faster than a cache hit in the downstream array - and a cache miss in both will be much slower. It's difficult to prove the value of THAT. Its certainly possible to tune both caches differently, but this turns out to be easier said than done and tends to flatten the value of an "easy to use" appliance.
So, the way to overcome this is to make a HUGE cache in an appliance. Increasing a cache's size can do a lot for performance and Chuck said,
The exception, of course, is if you've got the bucks to create a ginormous read cache, and pull almost all the significant data into memory. Don't snicker -- there are a few use cases where this sort of approach makes sense.
In other words, create a RAM disk in cache. This technique is normally reserved for a single, high profile application and as Chuck wrote, there are use cases where this makes sense. But it doesn't address the requirements of mixed workloads where there are a large number of applications that do not merit dedicated memory, but still need good performance. It might be possible to micro-manage cache for some number of applications by dedicating cache for each of them, but that requires a great deal of work that is likely going to be a temporary solution lasting a couple of months at most. It's probably a great way to drive storage admins crazy.
A more palatable approach is to use a global cache that shares cache resources among multiple servers and their applications. In some cases, it's possible to predict workload demands (end of month processing for example), but in many cases, the instantaneous performance requirements cannot be predicted because it is driven by spontaneous events. As many people are too familiar with, spikes in Internet activity and the corresponding bottlenecks in back end storage clearly elaborate the challenges of mixed workloads. Global caches that can accommodate large Internet traffic spikes are expensive and do not provide noticeable performance advantages most of the time - they are overkill.
The proliferation of virtualized servers has significantly increased the breadth of the mixture that a storage array has to deal with. In general, there is more overall I/O activity that is, for the most part, less predictable, and therefore more difficult to deal with in cache.
The question is, is storage tiering with SSDs any better? Possibly. If it is going to more effective than caching, it needs to be able to provide more control points and intelligence than cache typically does. For instance, the ability to prioritize and schedule applications for movement into SSD tiers could be an important difference. 3PAR's QoS Gradient concept in Adaptive Optimization is an example of a simple prioritization scheme. The internal counterpart to QoS Gradients is internal monitoring of I/O levels which help determine which applications are promoted.
That's not to say caching can't have some of the same controls, but traditionally caching has been more reactive than proactive. To be clear, tiering is also reactive, but within the context of intelligent preparation and business-driven policies.
Still, considering the cost of SSDs, you have to ask the question is tiering overkill too? It might be. If the array can keep up most of the time, how much SSD capacity should be purchased? These are the sorts of things that will be determined over the next couple years.
The best technology to date for dealing with mixed workloads continues to be wide striping. If you throw out latency-sensitive applications from the mix - those are the same apps Chuck Hollis referred to when he talked about the applicability of "ginormous read caches" - then the array just has to provide adequate throughput at reasonable service times (latencies).
Wide striping does this by spreading the workload mix over as many disk drives as possible. Not the number of drives that fit in a shelf or can be added to a RAID group, but all the drives in an array, at best or all the drives of a particular class, day SATA or FC. Wide striping very thin layers of data across hundreds of drives means that hundreds of servers can be accessing data simultaneously, with a minimal amount of contention. The result is that all the drives are kept busy at the same rate and that none of them bear an unfair burden. The overall sustainable throughput is very high, scales by adding more drives and in general fits the profile of mixed workloads better than any affordable caching scheme does.
We've become accustomed to thinking that more memory is the answer to all storage performance problems, but that doesn't exhaust the potential of massed disk drives.
Comments