In the past, I've created numerous reports in the SAN/Storage space to facilitate various activities such as refresh, billing/chargeback and operational support. Most of the work has been done in Perl with some Python and shell scripting thrown in along the way.
To create these reports requires a significant amount of data be collected from arrays, SAN and other related infrastructure components such as server and VM environments. And once you start collecting this data, it needs to be processed, normalized and stored so these reports to be generated in consistent and reliable ways.
Typically I've stored the data in CSV files. In the past I avoided using a database system for this storage for many reasons, but the primary ones were:
1) easier management - don't want to spend a lot of time doing DBA work.
2) I'm not dealing with millions of records, generally tens of thousands at a time, so processing from files is not a big hit with regards to performance.
This past year, I've decided to take a much more holistic approach to storage reporting, leveraging a lot more industry standards and open source tools to build reporting capabilities. So why do this myself and not utilize a tool of some sort? It boils down to a few key points for me. I want to invest in myself and not a tool. If I can fill a need, I feel like it's much more advantageous for the need to be satisfied internally. Tools can be expensive and seldom get utilized over 20% of what they can do (at least from my experience). You eventually spend more time administering the tool and less time doing the more important tasks. And you get locked into doing things the 'tool' way instead of working the way your organization needs. Needed capabilities are always in the next release...
So enough on that. There's not a wrong or right answer on this. It boils down to preference, internal skills and what works best for your organization.
For now I'll talk about the high-level architecture I've built and in later posts we'll dig a little deeper in specific areas. Our environment is primarily EMC so I have to collect from VPLEX/VMAX/VNX/UNITY/XTREMIO/RecoverPoint and [Brocade] SAN infrastructures. Each of these have varying capabilities for collection. Some require CLI and others RESTful services. Regardless of the vendor, the process would be similar. It's just a matter of building some collection routines.
For VMAX/VNX/UNITY I decided to use distributed collection via CLI at the local data centers. The volume of data and WAN bandwidth/reliability required local collection. A limited amount of processing is done here. Once per day I reach out to the site and pull this data centrally.
For XTREMIO/VPLEX/RecoverPoint I use central collection processing mainly due to RESTful capabilities. VPLEX offers some additional challenges, but that's another story. Collection here is also daily.
For VMware, I leverage vCenter collections via RVTools and VRealize solutions already in place. No need to re-invent the wheel.
A high-level diagram of the solution.
With all this data now pulled centrally, I process all the array/vplex volumes and RPA consistency groups and store this information in a Redis database. A RESTful web service is provided that utilizes the Redis database information. I leverage a Perl framework for creating the RESTful service. It's super fast and light.
As an example, you can do a GET to https://servicename:port/storage/array/volume to get a JSON list of all array volumes and associated attributes. Currently I have over three dozen attributes for a given volume. You can GET for a specific volume by wwid/naa with https://servicename:port/storage/array/volume/wwid.
By providing a RESTful service for storage information, other internal applications (i.e. CMDB) can pull global storage data from one location. No need to deal with all the complexities of connecting to arrays via SMI-S or other methods. All that complexity is managed behind the scenes. If we add a new array vendor or model, it is simply represented in the REST feed the same way as the others. Yes, I have to integrate the new array, but that can be done in a matter of a few days if not hours.
More about specific areas in later posts.