Don’t be surprised if you see the NetApp IT storage team busy doing other tasks during ONTAP® upgrades these days. Thanks to the power of the First Application System Test (FAST) program, which supports early adoption of ONTAP, the Customer-1 program is upgrading to the latest version of ONTAP with absolutely no disruption. In fact, the team is doing multiple upgrades on a weekly basis. This blog explores how we integrate ONTAP upgrades into a production environment without sacrificing IT stability.
Good Old Days?
Remember years back when application data was deployed on a filer? We would rarely see downtime unless there was a hardware failure or power outage. Configuration changes, such as export rules, network interface or route, were sometimes done on the fly in local memory. We’d forget about those in-memory changes on the filer.
When a hardware failure or power outage occurred, restoring the affected storage resource could quickly become turn into a fire drill. Some of the non-persistent changes were not documented, resulting in a mad scramble to discover the missing configuration. No wonder application owners resisted storage upgrades; it translated to downtime. We often delayed ONTAP upgrades to ensure we had stable operations. The irony of this situation was not lost on our storage team. We were expecting NetApp customers to be using the latest version of ONTAP but we weren’t always using it ourselves.
Customer-1 Adopts FAST
The Customer-1 program is the first adopter of NetApp products and services in our IT production environment. It is also responsible for the operation of our global data centers. Recognizing that we were missing out on the many features of new ONTAP releases, Customer-1 joined NetApp Engineering’s FAST Program several years ago.
Under FAST, we agreed to deploy release candidate versions of ONTAP storage management software in exchange for providing feedback on bugs and other performance issues prior to general release. We would exercise the code as well as reap early access to ONTAP’s latest features. Our goal was to improve our ONTAP lifecycle management so we were no longer afraid of storage upgrades.
Now Customer-1 installs pre-release ONTAP code into our lab and backup when Customer-0 (the Engineering IT group that also runs release candidate versions in its production environment) says the code is stable. Once we are comfortable with the stability of the code running in our lab (a non-customer facing and low-risk environment), we deploy ONTAP into sub-production and then into production.
We have some instances serving more than 100 applications. At first, trying to install even one ONTAP upgrade/week was challenging. With so much data to process, it was easy to miss potential risks. FAST helped us whittle our upgrade preparation process down to four hours using manual checklists and cross-checks.
To further improve efficiency, we added python scripts to compile a summary report with a pass/fail matrix that flags areas of concern. Now the Command Center can complete the precheck list in two hours and focus on the flagged areas.
Although painful at first, the process has been liberating in many ways, especially with ONTAP’s non-disruptive feature. We can upgrade one to two ONTAP clusters/week in addition to launching major releases twice a year and patches in between. Our lifecycle management process follows a regular cadence with absolutely no impact on the stability of business applications. Over time, we have identified 30 software bugs for Product Engineering to fix.
Our ability to repeatedly deliver ONTAP upgrades without any disruption to IT operations has also built the confidence of our customers, the business application owners. We regularly meet with them to proactively review the release schedule to avoid conflicts with application releases and ensure there are no surprises.
Over time, we have experienced numerous benefits. Our software lifecycle has shrunk; we are now running the latest ONTAP version in our production environment in 45 days or less. We have expanded the process to include NetApp OnCommand® Insight, AltaVault®, StorageGrid®, E-Series, and CI switch upgrades.
We have also increased our storage efficiency by taking advantage of ONTAP’s features well in advance of their general availability. For example, we were able to leverage the ONTAP 8.3 cluster image update wizard that updates by cluster instead of node. We are currently running ONTAP 9.2, which offers cross-volume (aggregate-level) deduplication, which has helped improve our Flash storage efficiency.
Thanks to the rigor of FAST, we have a constant flow of upgrades, but we no longer have to fear downtime or search frantically for configuration scripts. Instead, ONTAP upgrades are just another task in our daily routine. And that leaves us more time to work on the fun stuff in our jobs.
The NetApp-on-NetApp blog series features advice from subject matter experts from NetApp IT who share their real-world experiences using NetApp’s industry-leading storage solutions to support business goals. Want to learn more about the program? Visit www.NetAppIT.com.