cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

alerting for service metrics: response time and failure count

VictorRuiz
Frequent Guest

Dear experts,

I have the need to create a Problem based on two different metrics related to the same service, on one side it would be dt.service.request.response_time and on the other dt.service.request.failure_count, both metrics refer to the same endpoint.

I have created a DQL query that shows me in the same graph both metrics but I don't know if it can be alerted following the following conditions:

- response_time > 300 ms
- failure_count > 20
- For 60 minutes

Only if all three conditions are met, the problem should be raised to the trouble ticket.

 

DQL:

timeseries interval: 1m, { response_time = avg(dt.service.request.response_time),
count(dt.service.request.failure_count)}, from:-1h, to:now(),
by: { endpoint.name }
| filter endpoint.name == "test"

Thank you very much for your knowledge and time

2 REPLIES 2

TomásSeroteRoos
Dynatrace Advisor
Dynatrace Advisor

This is a fun question!

There's a couple of ways of doing this that come to mind, but I think the most suitable would be to use Davis Anomaly Detection.

It should be relatively straightforward from the instructions in the documentation. Just two things to keep in mind:

1. A Davis Anomaly Detector assumes it will receive one time series and works from that. Here you have two, so you'll have to combine them in some way to allow for alerting.

There's no single right answer, but you could try something like this:

timeseries { response_time = avg(dt.service.request.response_time), failure_count = avg(dt.service.request.failure_count) },
by: { endpoint.name }
| fieldsAdd response_time_alert = iCollectArray(if(response_time[] > 300, 1, else: 0))
| fieldsAdd failure_count_alert = iCollectArray(if(failure_count[] > 20, 1, else: 0))
| fieldsAdd alert = response_time_alert[] + failure_count_alert[]
| fields timeframe, interval, alert, endpoint.name

 This creates a new timeseries, alert, which is 2 when your first two conditions are met.

Then simply set your Davis Anomaly detector to alert with a threshold of 2!

2. For your third condition, you will want to adjust the sliding window settings in the anomaly detector (Toggle Show advanced properties in the Customize parameters step):

TomsSeroteRoos_0-1737378068480.png

60/60/* would correspond to what you are requesting. From experience, I would recommend a smaller time window though, as these slow acting alerts can feel a bit too sluggish. But up to you, of course!

VictorRuiz
Frequent Guest

Thank you very much. That's exactly what we are looking for

Victor

Featured Posts