Requirement #4 for guaranteed Quality of Service (QoS): balanced load distribution
Guaranteeing performance to thousands of applications at the same time is a daunting challenge, but it’s essential for anyone wanting to host performance-sensitive applications in a cloud environment. However, delivering true Quality of Service (QoS) requires an architecture specifically designed for the task. As we’ve shown, true QoS starts with an all-SSD platform, a scale-out architecture, and RAID-less data protection. The fourth architecture requirement for guaranteed QoS is a balanced load distribution across all the disks in the system.
Most block storage architectures use very basic algorithms to lay out provisioned space. Data is striped across a set of disks in a RAID set, or possibly across multiple RAID sets in a storage pool. For systems that support thin provisioning, the placement may be done via smaller chunks or extents rather than the entire volume at once. Typically, however, at least several hundred megabytes of data will be striped together.
Once data is placed on a disk, it is seldom moved (except possibly in tiering systems to move to a new tier). Even when a drive fails, all its data is simply restored onto a spare. When new drive shelves are added they are typically used for new data only – not to rebalance the load from existing volumes. Wide striping is one attempt to deal with this imbalance, by simply spreading a single volume across many disks. But as we’ve discussed before, when combined with spinning disk, wide striping just increases the number of applications that are affected when a hotspot or failure does occur.
Unbalanced loads cause unbalanced performance
The result of this static data placement is uneven load distribution between storage pools, RAID sets, and individual disks. When the storage pools have different capacity or different types of drives (e.g. SATA, SAS, or SSD) the difference can be even more acute. Some drives and RAID sets will get maxed out while others are relatively idle. Managing data placement to effectively balance IO load as well as capacity distribution is left to the storage administrator, often working with Microsoft Excel spreadsheets to try and figure out the best location for any particular volume.
Not only does this manual management model not scale to cloud environments, it just isn’t viable when storage administrators have little or no visibility to the underlying application, or when application owners cannot see the underlying infrastructure. The unbalanced distribution of load also makes it impossible for the storage system itself to make any guarantees about performance. If the system can’t even balance the IO load it has, how can it guarantee QoS to an individual application as that load changes over time?
SolidFire restores the balance
SolidFire’s unique approach to data placement distributes individual 4K blocks of data throughout the storage cluster to evenly balance both capacity and performance. Data is distributed based on content rather than location, which avoids hotspots caused by problematic application behavior such as heavy access to a small range of LBAs. Furthermore, as capacity is added (or removed) from the system, data is automatically redistributed in the background across all the storage capacity. Rather than ending up with a system that has traffic jams in older neighborhoods while the suburbs are mostly empty, SolidFire creates perfect balance as the system scales.
This even distribution of data and IO load across the system allows SolidFire to deliver predictable performance regardless of the IO behavior of an individual application. As load on the system increases, it happens predictably and consistently. And as new capacity and performance is added, the SolidFire system gives a predictable amount of additional performance. This balanced load distribution continues to stay balanced over time, an essential aspect of delivering consistent performance day after day. You just can’t guarantee QoS without it.