cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Incident on average treshold violation count of a certain exception

Bert_VanderHeyd
Advisor

Situation:

We have a very specific exception which occurs often. We have created a measure for the occurrence of this exception and a "Treshold Violation Count" measure on top of it.

We have defined an incident on an eval period of 1 minute which triggers when there are more than 40 (severe treshold) of this type of exceptions within this minute.

So far, so good...

People are now investigation when this "event" occurs and try to find the root cause...

But now we want a second "Incident" which should only be triggered when the first incident is active for more than 5 minutes. This is a more severe situation than the first one and other actions should be taken. So, basically we want to distinguish between a peek and a problem situation.

We tried with an incident based on 5 minutes on the "average" of the "Treshold Violation Count". But when I put the average of this Treshold Violation Count on a chart, it is always "1". Because this average is based on purepaths. And there is indeed always exactly one exception per purepath.

In a way I want to convert this transaction based measure into a time based measure.

I was thinking of modifying the "Dynatrace-Measure-Availability-Plugin" (https://github.com/Dynatrace/Dynatrace-Measure-Availability-Plugin) so it could read every minute the latest measure of a certain dashboard. But that's not so easy to achieve and requires a lot of maintenance afterwards.

So maybe there are better alternatives?

5 REPLIES 5

dave_mauney
Dynatrace Champion
Dynatrace Champion

Hi Bert,

Would counting the number of threshold violations over the 5 minutes perhaps be good enough? There is a Threshold Violation Count Measure you could use for that...

HTH,

dave

That is indeed what we are using. But in stead of just the total count(or sum) over 5 minutes, I only want this incident triggered when the average during these 5 minutes is over a certain level.

So, let's say we have the threshold at 10 to count easily. And there is only one peek of 11 at a certain moment. Then we don't want an incident.
Even when there is one peek of more than 50, we don't want an incident.
Only when there are more than 10 every minute during a period of 5 minutes. So, avg/min during 5 minutes > 10.

But when I chart the average of the violation count, it is always "1". Makes sense as this is avg/purepath and not avg

Sorry I did not read your initial post very well, since you clearly state you are already using a Threshold Violation Count Measure! For the 5 minute period, are you setting a threshold that is 5x the 1 minute period (200)? That would seem to catch "ongoing" issues at first glance, but then there could be 200 in one minute of course, so not really. Maybe you could suppress the 1 minute incident for 1 minute after each occurrence and then count the violations and look for > 5?

dave

Is that possible? Is there a measure when there is an incident fired? Or is there an action to create a measure at that moment? Because then you idea might work.

This is just an idea...I have never tried it...my thought is the Threshold Violation Count measure will be created this way only once per minute.