Making the most of Microservices, Containers, Infrastructure as Code, and Continuous Integration/Deployment
All of these technology patterns have become mainstays of the modern application landscape. As distributed, self-healing, and automated architectures have come to dominate greenfield deployment models, a new breed of database architectures has risen to complement these highly scalable systems. Unfortunately, NoSQL isn’t always an option for many enterprise IT departments, specifically to the exclusion of traditional SQL systems. Each tool has its well-deserved place in the data-persistence landscape.
More and more organizations are embracing the reality of polyglot persistence, which is simply a fancy way of saying that a DevOps group manages more than a few different flavors of database systems as part of the overall technology stack. Oftentimes, you will see primary customer record applications backed by traditional SQL systems, such as MySQL, MS SQL Server, Oracle, etc., while content and analytics systems are based on top of MongoDB, Cassandra, Hadoop, and the like. In many cases, you’ll see a best-of-breed approach where several of these will be used to back discreet services in the overall architecture. This poses several problems for DevOps practitioners, among which are:
- Expertise diffusion
- Manual intervention exceptions
- Automation challenges
Traditionally, the complexity, specialized knowledge, and critical nature of databases has necessitated the role of dedicated DBA teams to handle all things database, outside of the main sysadmin group. With the rise of DevOps patterns and automated, massively scaled systems, the lines between roles have blurred, and many development groups have taken a greater hand in performing traditional DBA duties. In fact, many NoSQL systems were designed by developers, for developers, to be simpler to administer, with a more programmatic approach to database infrastructure deployment and scaling. Node-based architectures, such as Hadoop and Cassandra, automagically balance data behind the scenes to enable self-healing systems that can tolerate node failure, as well as new node addition, decreasing the need for manual intervention.
With the increase in diversity and adoption of novel data persistence layers comes a corresponding diffusion of expertise in what was once considered a high-value technical specialization. DevOps teams are specifically tasked with one goal: doing more with less. This means leveraging newer database platforms that have automation built in, as well as creating automation patterns for traditional database systems to allow them to fit into the scalable model that is the basis of modern platforms.
Backup and recovery, query optimization, dev/test copy creation, as well as HA and DR have now fallen onto the plate (or platter) of multi-hat wearing DevOps practitioners, in addition to the demands of CI/CD and traditional sysadmin requirements. This is where automation and programmable infrastructure are critical, lest the consolidation of talent experiment fails. Without the ability to fit the square peg of database administration into the round hole of DevOps patterning, the need for manual intervention exceptions can become a huge portion of unplanned work in the iteration cycle.
To this end, Database as a Service (DBaaS) platforms, such as Trove in OpenStack, are being leaned upon to automate many traditional tasks that formerly required specialized knowledge. With DBaaS systems, much of the DBMS-specific tasks are baked in, behind a common API, creating a programmable wrapper around a set of complex technologies.
There are many technologists actively innovating and helping to solve these problems, shaving the square peg’s sharp corners for seamless integration into software-defined systems. Joyent recently posted an “autopilot pattern” for MySQL, a traditional SQL database, for use with containers. These scripts gracefully automate the complex manual task of setting up and maintaining replication relationships within a MySQL cluster, allowing for hands-off fault handling and scaling of a database array.
Other tools in the belt, such as automated storage infrastructure can vastly increase the flexibility and agility of database operations practices by leveraging advanced capabilities. Direct attached or in-chassis storage necessitates that database copy operations or replica instantiations be accomplished in the slowest way possible, by way of file transfer over the network. For databases of any real size, this becomes an arduous and prohibitively long process, commonly requiring manual babysitting. A useful example of the how these tasks can be accelerated for MongoDB (by 100x) can be found in this video guide.
Moving from traditional file-based backups to active snapshot data protection patterns can also help transform how continuous integration and deployment interacts with databases. Rather than having to roll forward and roll back complex schema migrations, database snapshots can be regularly taken to create a “library” of ready-to-use database images, each corresponding to release candidate schema versions. The ability to treat point-in-time data as a programmable resource alleviates much of the heavy lifting that’s required to keep code and fully populated data in sync with each other. This has the added value of eliminating the problem of “the query was fast on the test database!,” which occurs when a data-light database is used to develop against, versus a fully loaded multi-TB instance that mirrors production conditions.
All in all, DevOps professionals face a mammoth task of integrating data platforms and automating all aspects of the traditional data center, many of which were covered entirely by specialists in separate departments and groups. As developer and operations groups continue to merge, new patterns will continually be required to integrate traditional systems and tasks into the automation fold, and tooling around the outlier of databases will play a pivotal role in this technology confluence.
What patterns has your DBA team embraced to move closer to the goal of automated operations? How are you leveraging the automation capabilities of your infrastructure to leverage your investment and do more with less?