Problem not opened in case of failure rate 100%

tesp11331 · ‎11 Feb 2026

Hi All,

I see on my tenant that for some services in case oh high failure rate (even 100%) no problem is opened or the opening is delayed. See for instance:

For another service (with the same alert condition) the problem was immediately opened:

Why this different behavior ?

Thanks

Regards

Pasquale

Julius_Loman · ‎11 Feb 2026

Most likely, those are because of frequent issue detection. Can you check if you have any such event on the service? Also, check your anomaly detection setting on the service (or in the settings hierarchy).

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

tesp11331 · ‎11 Feb 2026

I don't think this is related to a frequent issue, usually we do not have such failure rates on that services. It is also strange that in case of the first chart the problem was opened but later and not at the beginning when the failure rate was at 100%.

The alert condition is the same for all services (absolute threshold=3% and relative=60%).

apanoobee · ‎15 Apr 2026

Hey, by any chance did you find an answer to this as I'm on the same boat?

tesp11331 · ‎15 Apr 2026

Hi, unfortunately I didn't

t_pawlak · ‎16 Apr 2026

Hi,

My guess is that this is mostly related to how Dynatrace evaluates the violation over time, not only to the visible percentage on the chart.

Even if the chart shows 100% failure rate for a short period, Dynatrace usually raises the event only after the anomaly detection logic collects enough violating samples within its sliding window. For anomaly detection, the default behavior is typically 3 violating one-minute samples out of 5 minutes before the event is raised. So a short spike can appear immediately on the graph, while the problem itself is opened a bit later.

A second factor can be the traffic volume / number of requests behind the percentage. Two services may have the same configured thresholds, but if one service has only a few failing calls and the other has sustained failing traffic, the evaluation can behave differently even when the displayed failure rate looks similar.

I would therefore check:

1. the service anomaly detection settings (and inherited settings),
2. whether the service had enough request volume during that time,
3. whether frequent issue detection/suppression played any role,
4. and whether there is any difference between problem opening time and notification sending delay from the alerting profile.

So in short: the different behavior is most likely caused by sample-based evaluation and sliding-window logic, possibly combined with different request volumes on those services.
For example, go to the service settings and check the anomaly detection configuration, especially the “Avoid over alerting” option.