16 Oct 2025
08:57 AM
- last edited on
17 Oct 2025
01:00 PM
by
MaciejNeumann
We are implementing different strategies for custom problem detection and reporting/alerting.
First there are anomaly detectors, where you can leverage pretty much anything that DQL has to offer to create Davis Problem alerts. Then there are workflows and SRGs, where you can pretty much achieve the same (at a higher cost I'd assume) with some more flexibility and - especially for SRGs - a history of your evaluations.
Now I've been looking into the discrepancy of auto-adaptive threshold configuration of objectives in SRGs (which is kind of a Anomaly detection as well). What I'm missing there is the advanced configuration of the detection like in anomaly detectors (violating samples, etc...)
For me it would make sense that the same configuration would also be available in the SRG Objectives
17 Oct 2025 09:15 PM
@r_weber SRG is a small hidden gem here. In combination with simple workflows, it's actually free of any costs (except for DQL queries of course). I use it especially for evaluating sparse data. One simple workflow to trigger SRG, then have OpenPipeline to extract a Davis event from the SRG evaluation bizevent if the result is failed.
Still, SRG misses a few things, including the one you mention. I do miss in particular:
27 Oct 2025 12:27 PM
Hey @r_weber and @Julius_Loman ,
Thanks for sharing your thoughts. And I fully agree that Davis anomaly detectors, SRG, and SLOs overlap in certain areas. Hence, I'd argue that their value shines most in combination and for different use cases.
Just some background on why there are different flavors of evaluating objectives:
While Davis anomaly detectors are designed to validate deviations from metrics in order to raise (alerting) events, the SRG was primarily created to allow quality gates, combining a heterogeneous set of objectives. SRGs are typically triggered on-demand covering a pre-selected period of time, e.g., the last 30min, while Davis anomaly detectors are evaluated continuously and only raise events if the conditions is violated. SRGs, in contrast, provide historical views, allowing one to see a trend, e.g., if the performance stats of a service change over time, which, not necessarily, is considered a problem, depending on the context of the validations.
@r_weber in what situations would you like to use the SRG and in what the Davis anomaly detection, assuming SRG objectives would allow the same configuration as Davis anomaly detectors?
@Julius_Loman : I like the idea of having multiple threshold validations from one single objective result and the option of allowing code instead of DQL as an objective.
As you are using the SRG to actually raise Davis events, what is your main reason to not use Davis Anomaly detectors right away?
27 Oct 2025 03:20 PM
@Gerhard-Kbasically because:
Likely, my use case can be rewritten in Davis Anomaly Detector at much higher costs (likely 60 times more on query costs). In some cases, this can be eliminated by creating metrics and use the anomaly detector on them, but that is complicated to manage. Especially when grail metrics cannot be deleted.
As for the code as input, I'd like to see this as a source for the Davis Anomaly Detector. For me, it perfectly makes sense. There will still be data in other sources to be evaluated.
28 Oct 2025 06:22 AM
@Julius_Loman got it and makes absolutely sense to me, thanks for sharing.
I'd be interested in more details in your use case, i.e., what data you're looking at typically?
29 Oct 2025 07:33 PM
Hi @Gerhard-K ,
I understand the difference of the continuous evaluation of anomaly detectors and the lower frequency of SRGs or ad-hoc trigger of SRGs. Actually my customer uses mostly SRGs due to the complexity of objectives checking business data.
One example: we have a business metric ingested that delivers a datapoint only every day. The SRG objective can be set to auto-adaptive threshold to learn what is normal. But the analyzer for that can not be parameterized in a way like in a normal notebook, where i can define a window and/or violating samples. To me it seems the auto-baseline of that slow metric in a SRG objective is not properly "calibrated"
@Julius_Loman
I like the simple workflow approach, triggering the SRG and then using the bizevent of the guardian result to create a Davis event/problem, which in turn could trigger another workflow (e.g. for notifications). So far most of my users do evaluate the result of an objective directly in the workflow that triggered the guardian.
There could be some cost-saving potential there to chain workflows like this 🙂