22 Apr 2026 11:37 AM
Hi all,
Over the past couple of months we've been asked to replicate a lot of alerts from other monitoring tools into Dynatrace which has been a bit of a challenge. The possibilities with DQL and anomaly detection are pretty much endless and you can get an amazing outcome on the alerting condition but one thing kept recurring as a problem.
The anomaly detection configurations have a big delay compared to metric events. This is a really big issue when it comes to time sensitive alerts that every minute of downtime matters and costs a lot of money. Essentially the problem is that every anomaly detection configuration will create an event and problem after 3+ minutes from the violating sample being ingested into Grail. Whether the DQL query uses a log or a metric this is always the same behavior. Here is an example below:
You have a really simple alert for metric A, when it gets above the value 10 a problem gets created.
You configure an anomaly detector and a metric event with 1 violating sample and a 3 minute window.
Let's say a datapoint is ingested with the value 15 then we see the following behavior:
The metric event usually creates a event and opens a new problem after 30-60 seconds.
The anomaly detection configuration creates an event and opens a problem after 3+ minutes.
Here is a screenshot with the events and problems from the above scenario:
The datapoint with value 15 for the metric was ingested at about 11:39:30 AM.
At 11:40:05 the metric event configuration created the first event and created the problem P-xxx529
At 11:42:43 the anomaly detection configuration created the first event and created the problem P-xxx531
That is over a 2 and a half minute difference between the two. And these results are pretty much the same on every test case we've had, either real scenarios or simulated.
I understand that anomaly detection uses the Grail data warehouse, while metric events use the classic metrics and there is probably a time difference for the data to be ingested and processed in each path. But the difference in delay is a deal breaker for time sensitive alerts. For some simple alerts we can use metric events, but when we want a bit more complex logic with DQL like parsing, joining other data, etc then anomaly detection is the only option.
We have tried using the event properties dt.davis.analysis_time_budget:0 and dt.davis.analysis_trigger_delay:0 with no difference in the delays for the creation of the event and problem. We have also tried setting both the violating samples and time window in anomaly detection to 1 minute which again makes no difference. The only working workaround we have found is to have a workflow that executes every minute and sends a notification via mail or other integration, but this is not scalable (or cost effective) in any way.
I would love to know if you have any suggestions on this topic and if you have found a way to get alerts and problems faster with anomaly detection. Has anyone else had the same issue in the past?
Thank you for your time!
Best regards,
Karolos
23 Apr 2026 06:30 PM
Very good question! I have the same experience and only found this in the changelog, but it seems to apply only to problem-opening events, not to Anomaly detectors.
@DavidBruendl can you advise?
23 Apr 2026 10:08 PM
Please check this out:
https://community.dynatrace.com/t5/Synthetic-Monitoring/dsfm-server-metrics-latencies/td-p/298175
24 Apr 2026 09:03 AM
Hey Antonio,
This is really interesting. I wasn't aware of that metric and it explains some metric events that I had tested in the past and had a big delay in producing an event/problem. It seems like the same metric doesn't exist on Grail, right? I can't find anything similar there.
Featured Posts