cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Issues with ArrayMovingSum and Davis Anomaly Detection

wellpplava
Contributor

Hello everyone,

I’m encountering some issues while using ArrayMovingSum together with DavisAnomalyDetection and would like to know if anyone has faced something similar or has suggestions.

I’m trying to use it as an alternative to the rollup in metric events, even though I know that ArrayMovingSum is not exactly a substitute due to its specific characteristics. However, while implementing it, I’ve encountered difficulties that are affecting the expected results.

My query:

timeseries `order-success` = sum(`log.order-success`), interval: 1m
| fieldsAdd `order-success` = arrayMovingSum(`order-success`, 30)
| join [ timeseries `total-order` = sum(`log.total-order`)
| fieldsAdd `total-order` = arrayMovingSum(`total-order`, 30) ], on: { interval }, fields: { `total-order` }
| fieldsAdd expression = `order-success`[] / `total-order`[]
| fieldsRemove `total-order`, `order-success`
| fieldsAdd expression = expression[] * 100

Basically, the query calculates the success rate and is configured to trigger an alert if any minute within a 30-minute window falls below 75%.

The issue is that, when looking at the Data Explorer, this value has never been reached, but problems are still being triggered.

From what I understand, the difference between rollup and ArrayMovingSum is that rollup considers a range beyond the timeframe, while ArrayMovingSum can accept null values. But if I’m analyzing a 30-minute window, shouldn’t the value be the same?

4 REPLIES 4

eugene_chuang
Observer

We ran into the same issue after applied the arrayMovingSum from the DavisAnomalyDetection app.  False alerts were triggered.  The responses from DT support is as the followings: 
1. suggest moving away from arrayMovingSum as its generally not recommended for alerting as it can produce unexpected results such as this.

2. For our second suggestion, we would advise implementing a 3-5 minute query offset as we suspect there could be delays in the metric.

I really like the idea of using the arrayMovingSum.  It supposed to help us to smooth out some isolated spikes to reduce the false alerts.  I am looking forward for any recommendations.

Almost all of our alerts are built with "rollup", precisely to add intervals of 5, 10, 30 minutes... We have a lot of business alerts. It's really missing.

DavidBruendl
Dynatrace Advisor
Dynatrace Advisor

Hello @wellpplava , @eugene_chuang ,

arrayMoving and rollup are sadly not 100% the same as written here - https://docs.dynatrace.com/docs/shortlink/metrics-selector-conversion#rollup-transformation

If you have an arrayMovingSum from 30 minutes and you had a setup of a sliding window of 5 minutes in the anomaly detector, it will than fetch 5 minutes, which is not enough to calculate the sum of 30 minutes. So for that example please increase the sliding window of 30 to fetch 30 data points to calculate the sum correctly. I do understand that is hard to understand and we are working on a new anomaly detector that allows different intervals. 

please let me know if you still facing some issues 

thank you very much 

best greetings

David

Hello!!

I understand! My idea was: if I need to calculate the sum of the last 30 minutes, my Sliding Window needs to be larger than 30. However, this wasn’t enough to prevent false alerts.

In the Notebook, it worked—I can see the correct values. But in Davis Anomaly Detection, it didn’t. I must admit I find using Davis Anomaly Detection a bit confusing because the preview in the Notebook doesn’t reflect the potential alerts that were triggered. I believe the preview in the Notebook should accurately represent the configured settings.

Featured Posts