21 May 2025 02:03 PM
Hey all,
I need to create an alert when a certain metric* is above a threshold for more than 2 hours. Since the maximum length of the Anomaly Detection sliding window is 60 minutes, I thought of this workaround:
creating a DQL query that would chart how many consecutive minutes this metric was above the threshold, and create an alert when this value exceeds 120. Is this possible?
Thanks!
*confluent_kafka_server_consumer_lag_offsets
21 May 2025 09:22 PM
Hi,
In your case, I recommend adjusting the alerting profile. You can configure it to generate a problem when the metric shows an anomaly—this can be based on a 5-minute sliding window, for example.
To reduce noise, you can set the alert to trigger only if the problem remains open for at least 2 hours before sending a notification.
You can use DQL with a count function only if it is a static threshold that you know (so you can put it in the DQL query).
Let me know if you need help setting this up!
25 May 2025 07:48 AM - edited 26 May 2025 08:37 AM
Hi yanezza, thank you for your help!
The thing is a normal behavior in our case always includes periodic spikes. Our concern is when the duration of the spikes exceeds some threshold, and not necessarily abnormality with the metric values.
Although your suggestion might help with sending notifications when a real problem will arise, it would still fill our environment with huge number of false problems.
Is there any proper way to handle this with DQL? i.e. instead of getting the resulted metric values by time, we would like to get the resulted duration of certain metric values.