Check out the rest of the “Stop that, start this” series, where we expand on concepts that allow you to move away from the pains of storage past. 1. The History | 2. 2005 Wants Its Storage Provisioning Challenges Back | 3. More On Storage Provisioning and Performance | 4. No Begging for Storage Automation | 5. No More Stranding Storage

I have always been fascinated by strong historical characters, especially authors who spent the time and effort to craft a message and put pen to paper to expose it. For those not familiar with Dale Carnegie, I suggest a quick glance at his Wikipedia entry. Odds are you have been exposed to several of his writings in the course of your life and may not have known it. How to Stop Worrying and Start Living is one that has fascinated me, especially as it applies to the IT world.

As someone who spent 15 years on the end user side of IT, I spent a lot of time worrying about the various technologies I was enacting, how they would hold up to the challenges that seemed to pop up out of the blue. With this said, we’re starting a new series here on the blog, the goal of which is to expand upon some storage concepts that can allow you, the customer, to worry less and focus more on the future without having to fret about the past.

Throughout the series, we’ll cover things you should stop doing in storage. I’ll discuss a few of them broadly in this, Part I.

 

Long, long ago, in 1987 …

In 1987, researchers at Berkeley California (Patterson, Gibson, and Katz) released a paper titled A Case for Redundant Arrays of Inexpensive Disks (RAID), collectively known within the storage industry as “the Berkeley RAID paper.” Contained within, the researchers noted that a redundant array of inexpensive disks were preferable to a single large expensive array not only in terms of cost, but as well as providing superior data protection and performance. Taking this theory and putting it into practice, the modern storage industry was born and has relied upon RAID ever since.

In 1987, researchers at Berkeley California (Patterson, Gibson, and Katz) released a paper titled A Case for Redundant Arrays of Inexpensive Disks (RAID), collectively known within the storage industry as "the Berkeley RAID paper."
In 1987, researchers at Berkeley California (Patterson, Gibson, and Katz) released a paper titled A Case for Redundant Arrays of Inexpensive Disks (RAID), collectively known within the storage industry as “the Berkeley RAID paper.”

RAID’s wide-scale adoption can be said to be responsible for most of the storage array platforms that exist today. And while you probably are not reading this blog post to get a history lesson on the fundamental concepts that birthed an industry, there is one item from the paper I’d like to dig deeper on:

Increasing performance of CPUs and memories will be squandered if not matched by a similar performance increase in I/O.

That’s the first line of the abstract from the Berkeley RAID paper, and in my view, this holds just as true today as it did back then. The ever-increasing performance gains for which Moore’s Law provides will continue to outpace the ability of storage platforms to service the I/O requirements they can sustain.

Out of the three pillars of the data center (compute, network, and storage), the storage side of things has always seemed to lag in terms of performance. As such, it becomes a limiting factor in many data centers. At SolidFire we have worked from the beginning phases of our design process to address this mismatch of resource allocation, or to at least bring it closer together and meet the challenges laid out by our storage predecessors.

 

Yep, that’s your father’s Oldsmobile

This IS your father's Oldsmobile

When we look at the storage industry in its current state, we see a large number of designs that continue to leverage concepts created decades ago to address the storage challenges of those times. Don’t get me wrong; that these designs are still being utilized for newly created storage platforms are a testament to their validity. That said, those designs will not hold up next to the solutions crafted from day one to tackle the mismatch in resource utilization and the design flexibility the modern data center requires.

Outside of simply faster and larger disks, array design has leveraged a “scale up” approach that has required storage administrators and organizations to plan in terms of capacity and performance for what was available from the component level during the period the storage platform was designed and delivered. Those systems will always be behind the market when it comes to having the most recent compute and memory resources at their disposal.

Commonly, these systems use a dual controller configured in various ways as active/passive or active/active in order to reduce a single point of failure for the array itself. The challenge here tends to be that storage administrators have to plan their environments and connectivity around these limits. Upgrades are complex and often require a “forklift” approach, and in some instances swing gear (a loaner unit) is brought in to serve as the middleman for temporary use while an upgrade is performed.

As design in the data center itself moves to a distributed model, the ability to pair appropriate resources for all three pillars needs to be taken into account. Scaling compute, network, and storage resources in tandem, or as needed without impacting existing workloads, should be a mandatory design consideration.

To illustrate the point further, “scale up” design (shown compared to “scale out”) is where a head unit (controller) sits as the gatekeeper for storage resources:

Scale-out vs. scale-up storage infrastructures

Storage administrators can only scale capacity in this design. The downside here is that it becomes more likely that the existing processing and network connectivity of the controller will be overwhelmed as the array is drawn upon to serve more and more resources to more and more end points.

In order to address increasing performance concerns, controller-based designs tend to use various workarounds:
Cache (memory) has been added to accelerate the rate in which data is fetched and provided to an external workload.
Many vendors leverage tiers of disk (either in faster spinning media, or solid state drives) to serve as staging areas for hot data to reside.

Others will attempt to guess what data is going to be hot, and will pin it into these cache/higher speed disk areas to meet storage demands and attempt to defeat the inability of the limited resources afforded in the controller unit to move the needed data at the needed rate.

Scale-out storage architecture

These additional tiers of storage can address some performance requirements, but often end up not being able to keep up with the dynamic demands that disparate workloads place upon the storage system. At some point cache is overrun and defeated, the hot tier cannot predict data movement fast enough, or simply the resources that are finite in the common dual controller design cannot keep up with the performance demands of today’s workloads.

To further complicate matters, storage administrators have to spend a significant amount of time and energy fine-tuning the arrays themselves and micro-managing RAID groups, lun placement, and path policies.

There has to be a better way.

 

Scale-out storage benefits

In a scale-out environment, capacity, compute, and network connectivity scale equally, so performance will remain linear as more units are added. Unlike the scale-up model, the ability to fine-tune and utilize resources is not impeded by the limitations of the controllers that were purchased with the storage system.

Of course, not all scale-out solutions are built the same. In the case of the SolidFire platform, the ability to mix and match different system models (mixed node clusters) is present, offering a far greater level of flexibility and agility. Downtime or disruptive data migration events are eradicated since upgrades to compute, memory, and connectivity are simply absorbed into the original system and become part of the greater pool of resources available.

To put it into simple list form, at SolidFire we have looked to design a storage platform that reduces the complexities of the past, and focuses on the workloads of the future through the following concepts:

  • Simplify capacity planning with a building-block approach
  • Eliminate forklift upgrades and controller limitations
  • Defer capital expenditures with incremental expansion
  • Eradicate the need for point solutions
  • Mitigate against painful migration events

Mixed-node clusters in storage infrastructure

Back to the thrust of this discussion: storage capacity and planning tends to be one of the more difficult challenges IT faces, primarily because of its unpredictable nature. A scale-out approach can help alleviate many of the challenges that other storage platforms are incapable of addressing without significant complexity. Storage should be able to keep up with the gains of the compute and network pillars in the data center.

I hope the above content is helpful. As always, the goal in writing these posts is to provide my take on a topic. If you have any comments or questions, feel free to hit me up online: @Bacon_Is_King.

If you’re ready to flash into the future, check out SolidFire’s white paper all about designing the next generation data center.

Gabriel Chapman