Recently, I made a mistake and hobbled some monitoring for an SSL application. My client pointed this gap in data. I need an effective way to cross-check the completeness of the data collected by DCRUM. I know there are a number of alerts that can report explicit problems in monitoring but I feel that there is a need to check the resulting data for completeness. I see a folder in the reports view for Vantage Monitoring Checks and I did try these reports. They are somewhat lengthy and I am hoping for a concise report. Ideally, there would be a kind of dashboard that has red/yellow/green signals for measures of health that cover all aspects of collected data. These could be compared against recent history with benchmark reports.
I think some checks would be:
We are currently running version 11.7.1 aiming to go to 12.2.1 soon. If there are built-in features to do this, I would appreciate a pointer. If you have reports or procedures to easily confirm the completeness of DCRUM data collection, I would also appreciate that.
There are lots of ways to accomplish this and some of them are built in and it's getting much better in 12.2. Most of this evolves around routine and experience though.
But a general thing that works for any of the releases is to create a specific report for a SS and then monitor it before and after the change. This is assuming you actually do changes directly in production without having a separate acceptance environment where you can try out tweaks and alterations before commiting them to production.
A simiple report showing operation counts by hour and software service would have spotted my problem. I don't know if there is a way to formulate this sort of check so that an alert can be reliably raised, but that would make it easier. I have had problems where network changes were made that affected the network traffic received by my AMD, sometimes due to other problems. This can happen in production so I think a completeness check has to run against production data.
I did look at the blue ribbon reports too but did not recognize this kind of validation. I will try your suggestion and see if I can make it as compact as possible. Thanks for the thought.
It's really good idea. My point of view with the single SS though was that some installations I've seen have 100 or more SS's defined and that becomes a very crowded graph or table, obscuring the ability to distinctly spot a deviation.
If you come up with something you find useable - why not share it here ?