Requirement #1 for guaranteed Quality of Service (QoS): An All-SSD Architecture
Anyone deploying either a large public or private cloud infrastructure is faced with the same issue: how to deal with inconsistent and unpredictable application performance. As we discussed earlier, overcoming this problem requires an architecture built from the ground up to guarantee Quality of Service (QoS) for many simultaneous applications.
The first requirement for achieving this level of performance is moving from spinning media to an all-SSD architecture. Only an all-SSD architecture allows you to deliver consistent latency for every IO.
At first, this idea might seem like overkill. If you don’t actually need the performance of SSD storage, why can’t you guarantee performance using spinning disk? Or even a hybrid disk and SSD approach?
Fundamentally, it comes down to simple physics. A spinning disk can only serve a single IO at a time, and any seek between IOs adds significant latency. In cloud environments where multiple applications or virtual machines share disks, the unpredictable queue of IO to the single head can easily result in orders of magnitude variance in latency, from 5 ms with no contention to 50 ms or more on a busy disk.
The solutions are part of the problem
Modern storage systems attempt to overcome this fundamental physical bottleneck in a number of ways including caching (in DRAM and flash), tiering, and wide striping.
Caching is the easiest way to reduce contention for a spinning disk. The hottest data is kept in large DRAM or flash-based caches, which can offload a significant amount of IO from the disks. Indeed, this is why large DRAM caches are standard on every modern disk-based storage system. But while caching can certainly increase the overall throughput of the spinning disk system, it causes highly variable latency.
Data in DRAM or flash cache can be served in under 1 ms, while cache misses served from disk will take 10-100 ms. That’s three orders of magnitude for an individual IO. Clearly the overall performance for an individual application is going to be strongly influenced by how cache-friendly it is, how large the cache is, and how many other applications are sharing it. In a dynamic cloud environment, that last criteria is changing constantly. All told it’s impossible to predict, much less guarantee, the performance of any individual application in a system based on caching.
Tiering is another approach to overcome the physical limits of spinning disk, but suffers from many of the same problems as caching. Principally, tiered systems move “hot” and “cold” data between different storage in an attempt to give popular applications more performance. But as we’ve discussed before this solution suffers from the same unpredictability problems as caching.
Wide striping data for a volume across many spinning disks doesn’t solve the problem either. While this approach can help balance IO load across the system, many more applications are now sharing each individual disk. A backlog at any disk can cause a performance issue, and a single noisy neighbor can ruin the party for everyone.
All-SSD is the only way to go
All-SSD architectures have significant advantages when it comes to being able to guarantee QoS. The lack of a moving head means latency is consistent no matter how many applications demand IOs, regardless of whether the IOs are sequential or random. Compared to the single-IO bottleneck of disk, SSDs have eight to 16 channels to serve IOs in parallel, and each IO is completed quickly. So even at a high queue depth, the variance in latency for an individual IO is low. All-SSD architectures often do away with DRAM caching altogether. Modern host operating systems and databases do extensive DRAM caching already, and the low latency of flash means that hitting the SSD is often nearly as fast as serving from a storage-system DRAM cache anyway. The net result in a well-designed system is consistent latency for every IO, a strong requirement for delivering guaranteed performance.
An all-SSD architecture is just the starting point for guaranteed QoS, however. Even a fast flash storage system can have noisy neighbors, degraded performance from failures, or unbalanced performance. Stay tuned to this blog as we discuss the five other critical architecture requirements required for guaranteed QoS, and join us on our upcoming webinar with WHIR to learn more.