20 Jul 2022 01:58 PM - last edited on 21 Jul 2022 07:30 AM by MaciejNeumann
I have recently started using the Davis health self-monitoring Dashboard. I have found that this pointed out a serious flaw in Dynatrace that needs to be addresses immediately.
I suggest that you make a copy of the dashboard and then modify it to give additional details. Specifically create a dashboard element (TABLE) to show the following Data Explorer Query:
This will display Problem Notifications that FAIL and the associated HTTP return code.
If you do this you will see any failed Problem Notifications in your environment. It will give the details of how many Problem Notifications failed to reach the designated endpoint. Be aware that Dynatrace ONLY does retries IF and ONLY IF there is no acknowledgement. IF it receives any HTTP return code, this includes error codes (429, 5##) it will not retry the Problem Notification to your configured endpoint.
This means that you might not be alerting on ALL problems and therefore your users / support groups are not made aware of issues found by Dynatrace.
I have asked Dynatrace to fix this fundamental flaw in their Problem Notification process. To not have a retry capability when sending to external integrations is a serious flaw.
I have created a RFE for this idea Problem Notifications (Integration Configuration)
Please vote the RFE up and speak to your CSM about prioritizing this RFE. This needs to be modified and corrected to ensure a zero failure state. I have asked that the solution is configurable to allow various Http return codes to be configured for an appropriate retry interval (in seconds) and retry attempts.
It has been designated as planned but everyday without a fix will potentially cause some portion of Problem Notifications to possibly fail in your environment.
I hope to bring your attention to this matter so that you are aware of this flaw.