Re: Anomaly Detection delay compared to metric events

carolosjk · ‎22 Apr 2026

Hi all,

Over the past couple of months we've been asked to replicate a lot of alerts from other monitoring tools into Dynatrace which has been a bit of a challenge. The possibilities with DQL and anomaly detection are pretty much endless and you can get an amazing outcome on the alerting condition but one thing kept recurring as a problem.

The anomaly detection configurations have a big delay compared to metric events. This is a really big issue when it comes to time sensitive alerts that every minute of downtime matters and costs a lot of money. Essentially the problem is that every anomaly detection configuration will create an event and problem after 3+ minutes from the violating sample being ingested into Grail. Whether the DQL query uses a log or a metric this is always the same behavior. Here is an example below:

You have a really simple alert for metric A, when it gets above the value 10 a problem gets created.
You configure an anomaly detector and a metric event with 1 violating sample and a 3 minute window.
Let's say a datapoint is ingested with the value 15 then we see the following behavior:
The metric event usually creates a event and opens a new problem after 30-60 seconds.
The anomaly detection configuration creates an event and opens a problem after 3+ minutes.

Here is a screenshot with the events and problems from the above scenario:

The datapoint with value 15 for the metric was ingested at about 11:39:30 AM.
At 11:40:05 the metric event configuration created the first event and created the problem P-xxx529
At 11:42:43 the anomaly detection configuration created the first event and created the problem P-xxx531
That is over a 2 and a half minute difference between the two. And these results are pretty much the same on every test case we've had, either real scenarios or simulated.

I understand that anomaly detection uses the Grail data warehouse, while metric events use the classic metrics and there is probably a time difference for the data to be ingested and processed in each path. But the difference in delay is a deal breaker for time sensitive alerts. For some simple alerts we can use metric events, but when we want a bit more complex logic with DQL like parsing, joining other data, etc then anomaly detection is the only option.

We have tried using the event properties dt.davis.analysis_time_budget:0 and dt.davis.analysis_trigger_delay:0 with no difference in the delays for the creation of the event and problem. We have also tried setting both the violating samples and time window in anomaly detection to 1 minute which again makes no difference. The only working workaround we have found is to have a workflow that executes every minute and sends a notification via mail or other integration, but this is not scalable (or cost effective) in any way.

I would love to know if you have any suggestions on this topic and if you have found a way to get alerts and problems faster with anomaly detection. Has anyone else had the same issue in the past?

Thank you for your time!
Best regards,
Karolos

Julius_Loman · ‎23 Apr 2026

Very good question! I have the same experience and only found this in the changelog, but it seems to apply only to problem-opening events, not to Anomaly detectors.

@DavidBruendl can you advise?

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

dannemca · ‎23 Apr 2026

I am following this thread!!

Site Reliability Engineer @ Kyndryl

AntonioSousa · ‎23 Apr 2026

@carolosjk ,

Please check this out:
https://community.dynatrace.com/t5/Synthetic-Monitoring/dsfm-server-metrics-latencies/td-p/298175

Antonio Sousa

carolosjk · ‎24 Apr 2026

Hey Antonio,

This is really interesting. I wasn't aware of that metric and it explains some metric events that I had tested in the past and had a big delay in producing an event/problem. It seems like the same metric doesn't exist on Grail, right? I can't find anything similar there.