Davis Anomaly Detectors, SRGs & Workflows

r_weber · ‎16 Oct 2025

We are implementing different strategies for custom problem detection and reporting/alerting.
First there are anomaly detectors, where you can leverage pretty much anything that DQL has to offer to create Davis Problem alerts. Then there are workflows and SRGs, where you can pretty much achieve the same (at a higher cost I'd assume) with some more flexibility and - especially for SRGs - a history of your evaluations.
Now I've been looking into the discrepancy of auto-adaptive threshold configuration of objectives in SRGs (which is kind of a Anomaly detection as well). What I'm missing there is the advanced configuration of the detection like in anomaly detectors (violating samples, etc...)
For me it would make sense that the same configuration would also be available in the SRG Objectives

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net

Julius_Loman · ‎17 Oct 2025

@r_weber SRG is a small hidden gem here. In combination with simple workflows, it's actually free of any costs (except for DQL queries of course). I use it especially for evaluating sparse data. One simple workflow to trigger SRG, then have OpenPipeline to extract a Davis event from the SRG evaluation bizevent if the result is failed.

Still, SRG misses a few things, including the one you mention. I do miss in particular:

Reusing DQL result for multiple objectives in SRG - simply run a DQL once per validation and use result fields in multiple objectives. Especially needed when querying spans/bizvents and you can collect multiple values for evaluation in a single query.
Site Reliability Guardian with code as input to evaluate data from external sources in SRG

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Gerhard-K · ‎27 Oct 2025

Hey @r_weber and @Julius_Loman ,

Thanks for sharing your thoughts. And I fully agree that Davis anomaly detectors, SRG, and SLOs overlap in certain areas. Hence, I'd argue that their value shines most in combination and for different use cases.

Just some background on why there are different flavors of evaluating objectives:

While Davis anomaly detectors are designed to validate deviations from metrics in order to raise (alerting) events, the SRG was primarily created to allow quality gates, combining a heterogeneous set of objectives. SRGs are typically triggered on-demand covering a pre-selected period of time, e.g., the last 30min, while Davis anomaly detectors are evaluated continuously and only raise events if the conditions is violated. SRGs, in contrast, provide historical views, allowing one to see a trend, e.g., if the performance stats of a service change over time, which, not necessarily, is considered a problem, depending on the context of the validations.

@r_weber in what situations would you like to use the SRG and in what the Davis anomaly detection, assuming SRG objectives would allow the same configuration as Davis anomaly detectors?

@Julius_Loman : I like the idea of having multiple threshold validations from one single objective result and the option of allowing code instead of DQL as an objective.
As you are using the SRG to actually raise Davis events, what is your main reason to not use Davis Anomaly detectors right away?

Julius_Loman · ‎27 Oct 2025

@Gerhard-Kbasically because:

The data source is "sparse", maybe a few requests or bizevents per hour (raw data has of course much more volume)
Davis anomaly detector would execute this (likely) every minute, generating GiBs of query costs unless run on timeseries data (not my case)
track of results / evaluations (you have bizevents for SRG, including SRG overview)
being able to define hours when SRG will evaluate (like don't evaluate in non-business hours), although this can be accomplished to some degree with my extension ,see this thread ( by multiplying metrics values)

Likely, my use case can be rewritten in Davis Anomaly Detector at much higher costs (likely 60 times more on query costs). In some cases, this can be eliminated by creating metrics and use the anomaly detector on them, but that is complicated to manage. Especially when grail metrics cannot be deleted.

As for the code as input, I'd like to see this as a source for the Davis Anomaly Detector. For me, it perfectly makes sense. There will still be data in other sources to be evaluated.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Gerhard-K · ‎28 Oct 2025

@Julius_Loman got it and makes absolutely sense to me, thanks for sharing.

I'd be interested in more details in your use case, i.e., what data you're looking at typically?

r_weber · ‎29 Oct 2025

Hi @Gerhard-K ,

I understand the difference of the continuous evaluation of anomaly detectors and the lower frequency of SRGs or ad-hoc trigger of SRGs. Actually my customer uses mostly SRGs due to the complexity of objectives checking business data.

One example: we have a business metric ingested that delivers a datapoint only every day. The SRG objective can be set to auto-adaptive threshold to learn what is normal. But the analyzer for that can not be parameterized in a way like in a normal notebook, where i can define a window and/or violating samples. To me it seems the auto-baseline of that slow metric in a SRG objective is not properly "calibrated"

@Julius_Loman
I like the simple workflow approach, triggering the SRG and then using the bizevent of the guardian result to create a Davis event/problem, which in turn could trigger another workflow (e.g. for notifications). So far most of my users do evaluate the result of an objective directly in the workflow that triggered the guardian.
There could be some cost-saving potential there to chain workflows like this 🙂

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net