I'm asking the community if they have found a solution to the problem I'm describing below.
We have maintenance windows and sometimes processes do not come back after patching or a reboot.
My first question is this
1) If a process is "UP" before a maintenance window occurs..and I set Dynatrace to "Disable problem detection during maintenance window" If a process is "DOWN" After the maintenance window. Will dynatrace alert that it is down since it never detected it during the maintenance window?
2) If that does not work..does anyone know of an easier work around than below:
As an option I can set maintenance window to "Detect but don't alert"
Then at the end of the maintenance window. I can query the API for OPEN problems
that match specific TAGS I have set up. I can scrape the JSON data for these OPEN Problems of concern. Then set an API PUSH to CLOSE the Problems.
The idea is this.
If the problem is closed, and the condition occurs again. The Dynatrace will detect a "new" or "Fresh" problem for the same condition.
Is there anything in this idea that will not work?
Please advise and thank you DT community!
About first point, no Dynatrace will not alert about duch thing. This is becauwe change was happened during maintenance window. Dynatrace is not alerting about something that was on environment during window.
in general you can always query dynatrace api by script and count processes that are monitored before and after. In such case you will know if all is monitored properly. You don’t have problems for such case.
So I tried #2--Using the API to Close the Problem during a maintenance window as work around and it failed spectacularly
Let me tell you what I tried to do.
We know that Dynatrace during a maintenance window has the option of “detect problems but don’t alert”
SO I tried the following
1-Create a maintenance window
2-“Force a process unavailable problem to occur” during the maintenance window
3-Leave the problem still open after the maintenance window expires
This is where I get surprised...
4- go to the API.. I find the exact problem ID. I sent an API close after the maintenance window has expired
5-What I expected to happen was, since the maintenance window was over. DT would detect the process is down (since the old problem closed and generate a new problem.
6-THIS DID NOT HAPPEN. DT went along behaving as if the process was green and it appears to not be checking the state.
What going on here?
Why would closing a problem not result in a check in state? (I guess that depends on how it checks I don’t know how that is handled)
I waited 15 minutes and did not see a change. Does anyone know if the detection of down would have eventually occurred
You see what I am trying to do here.. What options are available?
How did you force the process unavailability problem? Did you configure it to 'alert on any process becomes unavailable' or 'alert on a minimum threshold'?
In general we do avoid to immediately open a fresh problem right after the maintenance window if the condition still is the same. I will check if we can improve here to automatically force close the suppressed ones and open fresh ones after the MW.
We are configured for 'alert on any process becomes unavailable '
However. The only thing I have been able to find that may help us currently is a setting of
"Detect but don't alert" for a maintenance window.
Then.. after the maintenance window is over... I make through cron or some scheduler and API call for any problems "OPEN"
The implication is that they are still open after the maintenance window then they are still unresolved.
I take that JSON output and do an API push to an external 3rd party notification system
Not elegant by any description...but functional.
My hope is that Dynatrace will include a functionality such as I mention here.