25 Jul 2024 10:41 AM - last edited on 30 Jul 2024 01:34 PM by Michal_Gebacki
This trick is applied using the metric expression (Metrics API - Metrics Selector).
[Batch/ Job case] :
To meet a particular need, in particular that of a batch/job which fails randomly and thus generates only one piece of data at a time, we will very quickly be inundated with alerts. Compared to the IA limitation (Sliding window + Dealerting samples), no more than 60min.
rollup transformation is the solution :
Like I said, if you're only using a single rollup, we're dealing with limitations, so missing data will end the problem...
*** WORKAROUND
To answer this we can use the rollup transformation. As illustrated below, we see how to extend the analysis window when we have detected the failed event at least once in the last 24 hours.
The advantage is to therefore avoid the repetition of the known event which will require correction.
Therefore, reduces the number of problems in the Dynatrace sense and avoids spamming notifications.
Given the 60 minutes limitation on rollup, 24 was set to ensure we cover one time slot per day and is very useful if at least 1 single datapoint is generated (e.g. metric C above = red area).
Metric B = represents the number of times the batch fails (purple points).
Result of detection:
Thanks
25 Jul 2024 11:46 AM
Thank you!
25 Jul 2024 07:15 PM
Your multiple rollup is really an interesting hack. Also gave me some ideas, but it seems something like an abuse 😂
Rollup rapidly gets out of control when used with most metrics. You might get affected by throttling quite fast, as I documented in https://community.dynatrace.com/t5/Alerting/Metric-events-that-sum-values-in-a-certain-time-period/m... but really would have to test how it goes with your idea ...