17 Feb 2025 03:52 PM
I'm trying to essentially replicate this type of built-in alert, to a dql and davis anomaly detector. What would the dql look like for a synthetic that runs every 5 minutes at 5 different locations where I want to alert if any single location fails 3 consecutive runs?
19 Feb 2025 03:18 PM
You could try building a logic using the synthetic events table:
fetch dt.synthetic.events
There are a couple fields there, like the status code, location and synthetic id that can help create another event or make some kind of timeseries for what you're looking for
19 Feb 2025 03:27 PM
I had tried something like the below but it ended up alerting when there were 2 consecutive failures in 1 location when I want it to only alert when there are 3 consecutive failures (synthetic runs every 5 minutes)
timeseries Synthetics = avg(dt.synthetic.browser.executions), by: {dt.entity.synthetic_test, dt.entity.synthetic_location, state, dt.source_entity}, interval: 1m
| fieldsAdd syntheticName = entityName(dt.entity.synthetic_test)
| fieldsAdd locationName = entityName(dt.entity.synthetic_location)
| filter contains(syntheticName, "<SYNTHETIC-NAME>")
| filter state == "FAIL"
| fieldsAdd arrayWithFailureConditions = arrayMovingMax(Synthetics, 5)
| fieldsRemove Synthetics
| summarize TotalAvailability = sum(arrayWithFailureConditions[]), by: {timeframe, interval, syntheticName, dt.entity.synthetic_test}
20 Feb 2025 02:56 PM
I think the way you did it is very clever, it should work. However, if I'm understanding what your query does correctly, I feel like you should modify it to arrayMovingSum instead of Max, use a 15 window and also split the summarization by location too. Maybe like this:
Can you maybe show your resulting data?
....
| fieldsAdd arrayWithFailureConditions = arrayMovingSum(Synthetics, 15)
| fieldsRemove Synthetics
| summarize TotalAvailability = sum(arrayWithFailureConditions[]), by: {timeframe, interval, syntheticName, dt.entity.synthetic_test, locationName}
If you can also show your threshold and alerting condition that'd be helpful too.
25 Feb 2025 12:38 PM
Thank you for the input here. I tried out your query since it did seem to be showing me what I wanted. I can can see that the datapoint value will rise to 1 upon the first failure then 5 mins later it will rise to 2 because of there being 2 consecutive failures and so on. Everything looks good from what I can see but the alert never actually triggers even though the preview shows it should. Still looking into why this never actually triggers and creates a problem.
Full dql here
timeseries Synthetics = avg(dt.synthetic.browser.executions), by: {dt.entity.synthetic_test, dt.entity.synthetic_location, state, dt.source_entity}, interval: 1m
| fieldsAdd syntheticName = entityName(dt.entity.synthetic_test)
| fieldsAdd locationName = entityName(dt.entity.synthetic_location)
| filter syntheticName == "Test Google"
| filter state == "FAIL"
| fieldsAdd arrayWithFailureConditions = arrayMovingSum(Synthetics, 15)
| fieldsRemove Synthetics
This is defined in the detector. I kept the sliding window to 3 because in my mind it needs to be below 5 since the synthetic runs every 5 mins, if that is not accurate please let me know. Violating sample I left to 1 because as soon as we have above 2 failures then I want to alert (aka, on the 3rd consecutive failure)