cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DQL for if any synthetic location fails 3 consecutive times

sivart_89
Mentor

I'm trying to essentially replicate this type of built-in alert, to a dql and davis anomaly detector. What would the dql look like for a synthetic that runs every 5 minutes at 5 different locations where I want to alert if any single location fails 3 consecutive runs?

sivart_89_0-1739807473699.png

 

4 REPLIES 4

Eric_Yu
Dynatrace Mentor
Dynatrace Mentor

You could try building a logic using the synthetic events table:

fetch dt.synthetic.events

 

There are a couple fields there, like the status code, location and synthetic id that can help create another event or make some kind of timeseries for what you're looking for

Eric_Yu_0-1739978205859.png

Eric Yu | LATAM ACE Consultant

I had tried something like the below but it ended up alerting when there were 2 consecutive failures in 1 location when I want it to only alert when there are 3 consecutive failures (synthetic runs every 5 minutes)

timeseries Synthetics = avg(dt.synthetic.browser.executions), by: {dt.entity.synthetic_test, dt.entity.synthetic_location, state, dt.source_entity}, interval: 1m
| fieldsAdd syntheticName = entityName(dt.entity.synthetic_test)
| fieldsAdd locationName = entityName(dt.entity.synthetic_location)
| filter contains(syntheticName, "<SYNTHETIC-NAME>")
| filter state == "FAIL"
| fieldsAdd arrayWithFailureConditions = arrayMovingMax(Synthetics, 5)
| fieldsRemove Synthetics
| summarize TotalAvailability = sum(arrayWithFailureConditions[]), by: {timeframe, interval, syntheticName, dt.entity.synthetic_test}

 

sivart_89_0-1739978823371.png

 

I think the way you did it is very clever, it should work. However, if I'm understanding what your query does correctly, I feel like you should modify it to arrayMovingSum instead of Max, use a 15 window and also split the summarization by location too. Maybe like this:

Can you maybe show your resulting data?

....
| fieldsAdd arrayWithFailureConditions = arrayMovingSum(Synthetics, 15)
| fieldsRemove Synthetics
| summarize TotalAvailability = sum(arrayWithFailureConditions[]), by: {timeframe, interval, syntheticName, dt.entity.synthetic_test, locationName}

 

If you can also show your threshold and alerting condition that'd be helpful too.

Eric Yu | LATAM ACE Consultant

Thank you for the input here. I tried out your query since it did seem to be showing me what I wanted. I can can see that the datapoint value will rise to 1 upon the first failure then 5 mins later it will rise to 2 because of there being 2 consecutive failures and so on. Everything looks good from what I can see but the alert never actually triggers even though the preview shows it should. Still looking into why this never actually triggers and creates a problem.

Full dql here

timeseries Synthetics = avg(dt.synthetic.browser.executions), by: {dt.entity.synthetic_test, dt.entity.synthetic_location, state, dt.source_entity}, interval: 1m
| fieldsAdd syntheticName = entityName(dt.entity.synthetic_test)
| fieldsAdd locationName = entityName(dt.entity.synthetic_location)
| filter syntheticName == "Test Google"
| filter state == "FAIL"
| fieldsAdd arrayWithFailureConditions = arrayMovingSum(Synthetics, 15)
| fieldsRemove Synthetics

sivart_89_0-1740486700453.png

This is defined in the detector. I kept the sliding window to 3 because in my mind it needs to be below 5 since the synthetic runs every 5 mins, if that is not accurate please let me know. Violating sample I left to 1 because as soon as we have above 2 failures then I want to alert (aka, on the 3rd consecutive failure)

sivart_89_2-1740486953699.png

Featured Posts