Dynatrace veterans are probably used to this type of question, and though we've been using Dynatrace since Appmon 3.0 days, I still have difficulties understand it.
We see a lot of problems with sporadic failures causing failure rate alerts. The failure rate itself might be above 50% (relative) for 10-20 seconds due to a single failure request. These generated high failure rate problems, so we enabled the anomaly detection setting to not alert us if the problem state was less than a minute.
After continuing to see problems detected, we increased to 5 minutes. Then we opened a support chat, and were told to try increasing it to 7 minutes due to the sliding 5-minute window:
We continue to see problems for failure rates lasting less than a minute. The problem might be open for 9 minutes...ok fine, it's considered an "abnormal state" for longer than a minute, but why not have an anomaly detection setting based on the timespan of the actual failures?
What is the solution to prevent alerts if the failures lasted less than 1-2 minutes?
Hi @techean ,
The issue with this approach is that the "problem open time" does not match the span of the failure rate and seems a bit random. You could see a problem open for 8-15 minutes for 2 failed requests that actually spanned 20 seconds.