Solved: Delay in showing an alert in problem console after detecting a problem

erh_inetum · ‎04 Oct 2023

Hi everyone,

Yesterday we found several alerts which appeared in problem console several minutes later than the problem was detected. One of them didn´t show in the console up to an hour later.

Reading this URL from the documentation https://www.dynatrace.com/support/help/platform/davis-ai/problem-and-root-cause/problem-lifecycle it's read "As the event start analysis timestamp represents the earliest point in time when the violating state was observed, the event end analysis timestamp represents the point in time after all necessary violation samples are collected and the Davis problem is raised. Because each event involved in the problem uses a sliding window, .. ..." I think that the delay in showing the alert in problem console is due to this sliding window and the analysis timestamp. Is it possible to configure this sliding window like we can do for metric events? Can someone explain us why this can happened and how to fix this behaviour?

For us this behaviour is very problematic because the final users are detecting the problems before us and we can't give a quick solution to fix the issue.

Thanks in advance.

Regards,

Elena.

ebourlas · ‎05 Oct 2023

Hi @erh_inetum ,

What types of problems are you reffering to? For example, anomaly detection for resources have dual thresholds (i.e. Memory usage plus page faults) so a problem won't be raised until both thresholds are met.

In case of problems like "failure rate increased", you can make the thresholds more sensitive either globally or for specific services.

Regards,

E.

erh_inetum · ‎10 Oct 2023

Hi,

Thanks for your answer.

This behaviour happens for all problems; specially for problems in which threshold are taken from the automatic baseline.

We are going to modify the anomaly detection thresholds for the different applications and creating metric events to get triggering alerts immediately.

We will test it and we will comment the results.

Thanks.

Regards,

Elena

erh_inetum · ‎03 Nov 2023

Hi,

We discovered with the help of a Dynatrace specialist that these delays were due to frequent issues.

The problems that were generated with a high delay, services were observed which were being affected by Frequent Issues.

This caused Davis to generate events, but since they were categorized as Frequent Issues they did not create a problem. When these events became more severe or Davis determined that there was a relationship with other subsequent events, it generated a problem in the console and assigned the start time to the time at which the event previously categorized as a Frequent Issue (first event detected in relation to the problem) began.

We hope this comment will be helpful in case this happens to any of you.

Regards,

Elena.