a·dap·tive ra·di·a·tion
1. the diversification of a group of organisms into forms filling different ecological niches

Similar to the biological sphere, the database technology realm has undergone an adaptive radiation over the past decade, with big data as the triggering event. This technology bloom has resulted in a data landscape rich with expanded options for using the right tool for the job. As the size of managed data sets has continued to grow exponentially, the strategies and systems employed to organize and access this ever-expanding data have experienced a growth and adaption phase surpassing anything we’ve seen to date. By leveraging the storage layer for many traditional server-side operations, operation times can be reduced by orders of magnitude.

The trackers at DB-Engines.com, the principal and most recognized site that scores database adoption popularity, currently lists nearly 300 distinct database technologies, including traditional databases, key-value stores, document databases, columnar databases, search-optimized engines, graph databases, and so forth. Many of the top-ranked systems are the perennial giants of the database ecosystem, such as Oracle and Microsoft SQL, but they are joined by ubiquitous open source contenders, such as MySQL and PostgreSQL.

The rise of MongoDB

Perhaps the most notable emergence in the data management landscape in the past five years has been the rise of the popular and versatile MongoDB database. A document database in the NoSQL category, MongoDB has quickly rocketed up the charts to take the #4 place in the DBMS pantheon, displacing PostgreSQL and sitting just behind the enterprise stalwart, Microsoft SQL. With only MySQL and Oracle ahead of the Microsoft mainstay, MongoDB has claimed the dominant position in the NoSQL world and continues to advance its market adoption across a broad array of use cases, verticals, and market segments.

With the release of the its 3.2 offering, MongoDB has transitioned to a multi-engine capable backend, promoting the efficient WiredTiger to the default configuration over its previous memory map system. MongoDB offers one of the most flexible architectures available today, both at the DBMS level, as well as at the physical layer, allowing implementation in a variety of cloud, hybrid, and bare metal scenarios.

New tech, repeating patterns

However, as with all traditional relational systems, MongoDB runs up against the limitations of physics when it comes to dealing with large amounts of data. All DBMS designs must balance the same quadrangle of physical resources: memory, CPU, network, and storage. While any of these four components can become a bottleneck to operational and maintenance activities, the most common are network and storage disk limitations for large data copies and transfers.

As datasets grow into the multi-terabyte range, with any DBMS, the problem of backup and recovery times continues to rear its ugly head, as it has since the inception of database technology. Equally frustrating to DBAs and DevOps practitioners is the problem of database copy time. Traditional methods of backup and restore involve copying the entire dataset off the server, over the network, and to the target server. As dataset sizes have increased, machine-to-machine throughput has not followed the same growth curve.

New solutions to old problems

Traditional backup and restore operations with multi-terabyte datasets, including MongoDB, can take many hours, if not days, to complete. Re-syncing or making complete copies of these data sets can likewise take similarly ungainly amounts of time to complete. In some cases, the transfer windows are simply too long to be feasible at all, creating serious problems in production environments.

The solution to this age old problem lies in rethinking how we approach the backup and copy process for multi-terabyte datasets. Bypassing the network entirely is the key to this decades-old operational difficulty. MongoDB documentation and best practices recommend filesystem snapshots as the primary method for backup, restore, and clone operations. We can take this one leap further by orchestrating the snapshot from the storage system level, using instant metadata snapshots and clones.

On modern storage systems, this method reduces the operation times by orders of magnitude (over 100x), and leverages the inherent capabilities of the storage layer itself, offloading the work, data transfer, and presentation between servers. Since the many terabytes of data have gravity, they are the most difficult to move. Modern storage system technologies add a new level of agility to the platform that can truly transform how we approach database administration, DevOps, and platform orchestration for MongoDB.

If you would like to delve further into the specifics of how you can achieve instant backup and restore for MongoDB and reduce replica set re-synch times to seconds, take a minute to review the following short form demo videos:


Chris Merz

A seasoned Internet services database veteran, Chris Merz is the Chief Database Systems Engineer at SolidFire. Chris develops benchmarking methodologies and database system certifications for SolidFire’s next-gen storage products.