Achieving storage scalability in the cloud
When people hear the terms “scalability” and “cloud” together, many associate it with the ability to scale compute and, to a lesser degree, network resources. But for some reason, most people do not associate those same terms with storage in the cloud.
In cloud computing, resources may be added or subtracted non-disruptively and are managed by a singular control software. The control software centrally manages all of the resources as a pool, scheduling users and handling the removal or insertion of capacity to the pool as the administrator needs.
On the network side, the cloud is not quite there yet, but is well along a path to a dynamic, scalable model. Software Defined Networking (SDN), automated cable management, leaf-spine architectures, and the emergence of commodity switching hardware, are pushing networking like compute: to a fully dynamic and centrally managed software model.
Storage, on the other hand, continues to hold on to legacy ideas like separate storage networks, separate pools of storage for various performance levels, and separate storage for critical applications. Yet there is more than one type of cloud storage. Within the cloud storage ecosystem, the two most common methodologies for storage that have solved the problem are fundamentally different based on what people need. While it’s common to use both in the same infrastructure, each has particular architecture and use case advantages.
Let’s take a closer look.
Method 1: Object storage
Object storage (Swift, S3, etc …) seems to have solved the scale-out problem by embracing immutable objects, open REST-ful APIs, the concept of eventual consistency, and shared nothing architectures. Due to lower performance and the nature of eventual consistency, this subclass is best used for use cases like streaming media, file sharing, and backup/archive repositories.
Method 2: Block storage
When it comes to the subclass of block storage, the use case expectations are quite different, and the scale story is very slow to change. In block storage, access rights, permissions, appending, and amending create a lot of complexity in terms of performance, I/O required, and the ability to scale. In addition, the protocols and expectations of block storage are not conducive to the eventual consistency model of high availability.
In the classic disk array architecture, one typically has to purchase all the resources up front to implement any and every service they might ever want — at least for the foreseeable future. That’s not scalability! Secondly, utilizing a software manager to maintain multiple disk arrays is not universal scale-out management! Scaling should be incremental and done on an as-needed basis without disruption. This allows you to buy only what you need today and take advantages of technology cost reductions as they occur. How does one change this design philosophy?
Overcoming challenges: Start with going back to the right storage design
Going back to fundamentals of storage design and asking good questions leads to new solutions focused not on arrays, but on cluster methodology. There are three basic elements of storage design: capacity, performance, and data services.
In today’s software-defined, NAND-flash world, performance and data services are tied together and driven by the compute resources (CPUs and memory) supporting the storage. One can therefore calculate an optimal ratio exists between CPUs, capacity, and performance. We must then ask how and when that ratio changes. In static environments it generally doesn’t. But the ratio of CPU to capacity does change as the fundamental storage devices (disks or SSDs) get bigger or faster.
The transition to scale-out
I believe that blocks composed of a well-defined and tested set of storage elements (SSDs) with the proper amount of CPU (and memory) provide a logical building block for the scale-out storage necessary for next generation data centers. With CPUs dedicated to the SSDs contained within the blocks, well-defined and bounded storage performance results from each block.
Using carefully characterized commodity building blocks allows the high-level cluster software to account for the various blocks and distribute load proportional to those blocks. Distributing load proportionally amongst nodes is required in every aspect of the cluster, including schedulers, redundancy, replication, and management, such that no single resource can become the bottleneck. Once a proper distribution mechanism is in place, it leads to even scaling and dynamic response to changes such as failure or addition. With proper load balancing, scheduling, redundancy, and management, a storage cluster has the characteristics one expects of the terms “scalability” and “cloud.”
Clearly the scale-up array approach needs to change to a clustered system architecture to meet the needs of block storage at scale.
A well-designed clustered architecture provides the proper foundation to scale incrementally without disruption. Once proper scalability is in place user experience should be addressed for further consolidation and the benefits it brings.
We’ll explore “user experience” in a highly consolidated storage infrastructure in a future post along with other key elements to building a successful next generation data center.