A few weekends ago, NetApp IT storage service managers received a call-storage was too slow. The issue was quickly identified and addressed with a NetApp® clustered Data ONTAP® upgrade. But had we really fixed the issue? We decided to use NetApp OnCommand® Insight storage resource-management software to track workload performance over the weekend to ensure the issue didn’t reappear.

Typically we use the NetApp Perfstat utility to look at workload performance as it takes the deep dives required to measure workload performance. But it captures data at a point in time, not over time. It is time consuming to use. Logs have to be uploaded and interpreted before any further actions can be taken.

First Step, Validate

We wanted to validate the benefits of using OnCommand Insight to track workload performance. We set it to bookmark and check the specific objects we were tracking using just three key performance indicators (KPI) for this workload: node utilization, aggregate space, and aggregate utilization. Any performance issues that were workload related would manifest in one of those three indicators. The beauty of OnCommand Insight is that it generates a graph depicting these KPIs over time.

Then, Test

Service managers checked OnCommand Insight regularly to ensure it displayed the simple line graph pattern we expected. Different shift leads took screenshots and noted gaps. We were checking to see if a different pattern emerged, such as a flat line at 100. This would alert us that the storage system was unable to handle the current workload and appropriate steps could be taken. Ultimately the storage system handled the workload without any issues.

Finally, the Results

OnCommand Insight confirmed the issue had been resolved and it delivered additional benefits that we hadn’t anticipated. It eliminated the extra work of capturing and interpreting the performance data. Thanks to OnCommand Insight’s node utilization report, we could use a single metric to draw a uniform conclusion about performance and eliminate log misinterpretations across different skill sets.

Other major advantages of OnCommand Insight:

  • False positives. Most of the time, performance issues are perceived to be workload related. The nature of storage is to give a lot of false positives. Because OnCommand Insight gives us performance data over time, we have the ability to quickly identify issues not related to storage and pass them on for resolution.
  • Historical context. OnCommand Insight removes the guesswork by providing performance data for the previous 60 days. Because this is hard evidence of historical performance, we don’t need to rely on application owner or storage admin anecdotes. The bottom line is that this speeds up our ability to resolve issues quickly.
  • Self-service. Another big time saver is the ability to share OnCommand Insight performance data in a self-service portal. The performance data is normalized so it doesn’t matter if the underlying platform runs on clustered Data ONTAP or E-Series, or 7-Mode. Application owners simply use a URL to obtain real-time updates on demand. This gives our business customers peace of mind and frees up time that our service managers need.

 

Final OCI troubleshooting screen shot.png

Storage performance management is a daunting task in any company. It is both complex and time-consuming. In one weekend we discovered that NetApp OnCommand Insight is an essential tool when it comes to resolving workload performance issues quickly and efficiently. The result? Our storage service team can resolve issues much more quickly and devote their time to other critical tasks.

For more information about NetApp IT and OCI, visit these resources:

The NetApp-on-NetApp blog series features advice from subject matter experts from NetApp IT who share their real-world experiences using NetApp’s industry-leading storage solutions to support business goals. Want to view learn more about the program? Visit www.NetAppIT.com.

stetson