Tuesday, April 10, 2012

EMC (VNX FAST Cache) + VMware (VMmark) = A killer result, a killer VROOM post.

Original Post: http://virtualgeek.typepad.com/virtual_geek/2012/04/emc-vnx-fast-cache-vmware-vmmark-a-killer-result-a-killer-vroom-post.html
Unless you're under a rock (or of the less geeky type :-) you know that Flash is transforming the IO business. It's orders of magnitude faster (latency, IOps), smaller (physically/power/cooling)… It's already cheaper when measured in $/IOps – by orders of magnitude.

Likewise, you know everyone in the industry is working hard to cover the use cases, adapt and innovate, and build value on top of the technology disruption.

One thing that's fun to watch is to see everyone claim everyone else sucks, or that Flash should ONLY be leveraged as a cache, or a tier, or on the server, or the array. Readers know that this is one of the things that I get a kick out of, because it's always such a silly position to try to hold. I wonder how people deal with the cognitive dissonance.

I get about 1 email a week from someone that says "vendor X" says FAST VP and FAST Cache "don't work". Sheesh. Obviously not students of the EMC Presales Manifesto<http://virtualgeek.typepad.com/virtual_geek/2012/01/this-i-believe-emc-presales-manifesto.html> ("Principle #6: be a positive force = never going negative on the other guy")

Interestingly, sometimes it DOESN'T help (IO skew is a large factor – we have tools that can help figure out fit). I've had two examples where FAST Cache WAS NOT a fit in the last 2 weeks.

Read on for details on those two examples, and the most recent public testing result about the effect of just a little bit of Flash into a mixed workload….

* In the last week of Q1 – I talked to a customer who didn't see much benefit on their CX4 with their workload. Their workload was very cache hostile, so FAST VP was a better fit. They were also a large financial, and needed tiering at very high scale, with very high availability, at very granular and with a policy that involve large amounts of moves over very small time periods – and so were looking at EMC VMAX and it's competitors. While that was a competitive situation – I'd like to thank them for picking EMC (again).
* VMware is working on some Hadoop on vSphere testing that we're supporting right now (interesting results to come soon). Hadoop is a very bandwidth gated workload, and while they have FAST Cache, and EFDs for FAST VP – that kind of workload doesn't care. Frankly the more array "brains" you put between the host and the data with that workload can detract from overall performance. So far, we're showing that our performance is as good as plain-old DAS, but maintaining the VMware shared pooled (and mobile) values you get from shared storage. May sound funny – but for these bandwidth gated workloads – local DAS controllers and spindles work well (see the "cluster architecture" Big Idea post here<http://virtualgeek.typepad.com/virtual_geek/2012/04/big-ideas-cluster-architectures.html>). For Hadoop and other workloads (SAP HANA as another example), IMO the magic of flash as a technology disruptor will apply as a part of the host memory tier rather than in the IO stack.

So – wait a second… I can imagine you asking: "Are you saying EMC FAST Cache IS useful, or that it ISN'T?"

I'm saying that for most customers, with the exception of specific workloads (the Hadoop example) – a little big of flash goes a long, long way. This applies well using both FAST Cache and FAST VP. That statement is almost ALWAYS true when you are talking about what most customers do with arrays – not run one workload, but rather a mix of workloads, and that those workloads vary over time.

Interestingly (at least to me) note that the above is the opposite of the two examples I noted (they both are examples of very specific workloads), and is the opposite of how you see most benchmark tools operating – including those sorts of tests that customers typically do themselves. In those examples, typically it's one narrow workload that ramps, hits a steady state, runs, then stops.

Are there any benchmarks which have this "mixed workload" characteristic common in more "real world" ?….

… Look at this recent (April 5th) VMware VROOOM<http://blogs.vmware.com/performance/> (the performance engineering team at VMware's always awesome blog and great team) post: http://blogs.vmware.com/performance/2012/04/exploring-fast-cache-performance-using-vmmark-211.html

VMmark is an interesting cat when it comes to benchmarks. It uses a tile-based model where each tile is composed of different VMs running different workloads. It's not a great IO benchmark (it's not designed to be) as it doesn't really stress the storage subsystem too hard relative to CPU/Mem, but it does scale up until there is a "fail". It's also not the easiest benchmark to setup and run (certainly not "quick and dirty"), but it does have that "mixed set of more real-world-ish workloads" characteristic.

So what's the effect of EMC VNX FAST Cache with VMmark?

Take a look:

[http://blogs.vmware.com/.a/6a00d8341c328153ef016764b085a9970b-500wi]<http://blogs.vmware.com/.a/6a00d8341c328153ef016764b085a9970b-popup>

That's the effect of a tiny amount of FAST Cache (a mirrored pair of 2 x 100GB SSDs). Without FAST Cache, the config maxes out with 20 HDDs at 2 tiles. Going down to 11 HDDs and adding the tiny amount of FAST Cache increased performance and enabled scaling to 4 tiles. YEAH!

Like all technology – your mileage will vary – and in IO land, it's mostly based on your workload. But if anyone ever tells you that EMC FAST Cache "doesn't work", well – point them to this testing, and also to the Manifesto :-)

No comments:

Post a Comment