A Superior Storage Fabric Has Arrived
The first new block storage networking protocol in years will soon change the way we build datacenters.
I’ve had the pleasure of meeting Rob Ober, and when you do so, you quickly realize that this guy gets it and understands where datacenter architectures are headed. Rob has moved on now, but last year he was at SanDisk and wrote a great blog. I’d encourage everyone to read through it, but my favorite part is this:
In some ways, 2016 is the calm before the storm, in anticipation of 2017 and the biggest architectural changes in 20 or 30 years. That’s when we’ll see Rack Scale Architectures, disaggregation, in-rack fabrics, pooled storage, and huge NV main memory deployments all lumped together.
That’s a lot of stuff to consider, so let’s focus on the very related concepts of Rack Scale Architectures and compute/storage disaggregation. Why? Because since Rob wrote that well over a year ago, the market has come to the realization that disaggregation is exactly what is needed for datacenter growth to continue to scale at the current rate.
Disaggregation. It may be a new word for some, but it simply means that you stop putting storage (HDDs, SSDs) in the server chassis and instead put them all into a separate, shareable pool. Sound familiar? That’s because the datacenter industry did just that back in the 90s, when we invented the Storage Area Network, created Fibre Channel, and put storage in separate, shareable pools. It was a good idea then, and it’s an even better idea now.
Why? Because of several factors, but here are the big two: 1) Storage is shifting from HDDs to solid state, and we need to better utilize this new class of storage, as it’s more expensive than rotating media. 2) Storage is exploding. I won’t justify that here, as it’s been justified hundreds of times elsewhere… but the bottom line is that in order to continue to scale, datacenters need to find ways to do so more efficiently.
Flash back to IDF14, when Intel did a demo of a disaggregated configuration, showing that a remote SSD (i.e. not in the server chassis, but across a fabric) can be accessed with nearly identical performance of a local SSD. This was the nucleation point for what would become a new standard, NVMe over Fabrics (NVMe-oF). The NVMexpress organization drove the development of the first new block storage network standard in 30 years, and published that specification in June of this year. This standard is intended to enable disaggregation, and one of the remarkable aspects to this standard is that there’s no competing standard that’s attempting to do the same thing – there is virtually unanimous consent that this is the way to accomplish this.
So it seems that it’s now just a matter of when, not if, disaggregation will happen. Why? It’s not because there’s a new shiny object to build and deploy, no; it’s because there is an obvious “killer app” for disaggregation enabled by the NVMe-oF standard, and it’s this: Storage utilization. Numerous scale-out vendors have already concluded that storage utilization in today’s current “shared nothing” model ranges from as low as 5% to as high as about 50%, depending upon configuration and workload. That’s horrendous. Buy a bunch of expensive SSDs and then use less than half their value?
Thus, the answer to “why should I rearchitect my datacenter” is the same answer that drives any successful disruptive new technology: Money. Imagine not having to buy nearly half of the storage you had budgeted for that new building-scale datacenter you have planned. Appealing? Yes. And that’s why every 3rd platform vendor we’ve talked with is looking at disaggregation and pulling a game plan together. They realize that when storage utilization goes up by 40%, they can buy 40% fewer SSDs. That’s it in a nutshell.
NVMe-oF demos are happening now, and production systems will be available early next year. It’s not a matter of “if”, just a matter of “when”, and Rob Ober called it back in 2015: The “when” is just around the corner.