cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Will default incident get logged on restart of server in which the agent is installed

praveena_patha1
Organizer

Hi Team,

We have installed an agent in webserver. Then we have created an incident rule for finding busy thread count >30. Then we have restarted the server in which the agent has been installed inorder to reflect in dynatrace client. Then eventhough the busy thread count was not >30 the incident rule has logged an incident. Will incident gets logged on restart of server in which the agent is installed?

Thanks

Praveena

18 REPLIES 18

joshua_pavlica
Dynatrace Pro
Dynatrace Pro

Hello Praveena,

Can you please post screenshots of:

  1. The Incident configuration screen
  2. The Edit Measure screen

This will help us to understand your issue better.

Kind regards,

Joshua P.

On the Incident configuration screen, I believe you are using the wrong aggregation.

The way the Incident is set up right now, if there are more than 30 Busy Threads in total during a 30-minute evaluation window, the Incident Condition is ruled true and the Incident is Triggered.

From your post, I believe you want to know if the average Busy Thread Counts are above 30. In this case, I would change the aggregation value to average. I might also change the evaluation window to be smaller, as the average condition would check if there was greater than or equal to 15 minutes of there being Busy Thread Counts above the value of 30. This means that in a triage scenario you would be on up to a 15 minute delay of there being an issue.

Let me know if this helps or if I can explain it better.

Kind regards,

Joshua P.

Thanks Joshua. I want to know the first scenario only i.e., "if there are more than 30 Busy Threads in total during a 30-minute evaluation window, the Incident Condition is ruled true and the Incident is Triggered". But at the particular time the incident got triggered the thread count was not >30. PFA for clarifiaction. @Joshua P.

3.png4.png

AppMon calculates metrics every 10 seconds. By choosing the aggregation of "count" you are counting every instance of that metric being captured during the 30-minute evaluation window. If you have 10 busy threads active over a minute, the incident would fire after 30 seconds (10 + 10 + 10 >= 30 so Incident will trigger).

Thanks Joshua. I understood what you have explained. But If I need to trigger an incident if the busy thread count is actually >30. What should I do now?

Every time if we keep the aggregation as count will it behave like what you have explained above? @Joshua P.

Can you explain the purpose of the incident for me? I might be able to better help if I know the reasoning behind it.

Kind regards,

Joshua P.

Actually our team wants to know if the busy thread count is greater than 30. So this is the only purpose. But with the incident rule we have created it is logging an incident if the thread count is 10 also. We do not want this type of incident to be triggered and need if the actual busy thread count is >30. @Joshua P.

If you want to know if at any time the busy thread count is greater than 30, I would do an aggregation of maximum or average, and set the evaluation timeframe to something small like 10 seconds. In theory this could also be achieved with Count at the 10s aggregation window.

Maybe @James K. could confirm when he wakes up (East Coast USA) or @Andreas G. if he's available.

What we do at my company is that we are only concerned if the busy thread counts are above 300 for an evaluation time period of 15 minutes.

So we have chosen aggregation as average, evaluation timeframe as 15, and the threshold for the measure is set to Upper Severe 300.

Can you please explain your incident rule which is with aggregation average in the same way you have explained for our incident rule with aggregation as count. @Joshua P.

To put it simply: In the 15 minute timeframe, if that timeframe has an average count greater than 300 Busy Threads, then the Incident fires.

Here is a general PowerPoint example that illustrates it. Please view it in Presentation Mode so the animations work. incident.pptx

Thanks Joshua. But for the aggregation count also should similarly work like average aggregation right?

One more doubt. In our case we have selected the agent group containing two servers. Is it adding both busy threads count and the incident is getting logged?

Count is different than Average.

Count adds up all of the instances where the measure records data.

Average adds up all of the instances where the measure records data, then divides by the number of times it measured.

Yes count adds up all instances but you have explained some scenario like

"AppMon calculates metrics every 10 seconds. By choosing the aggregation of "count" you are counting every instance of that metric being captured during the 30-minute evaluation window. If you have 10 busy threads active over a minute, the incident would fire after 30 seconds (10 + 10 + 10 >= 30 so Incident will trigger)."

will this not happen when aggregate is average? @Joshua P.

Average won't have this problem because it divides by how often it takes the measurement. This leaves us with the value as shown as in the PowerPoint I linked.


The PowerPoint explains the average aggregation very well if you view it in Presentation Mode.

you didn't clarify me on this doubt "In our case we have selected the agent group containing two servers. Is it adding both busy threads count and the incident is getting logged?" @Joshua P.

If you have the checkbox in the Edit Measure > Details tab for "Create a Measure for Each Agent" enabled, then it will be separate and not summed together as one value.

My threading incident for my Apache Web Servers has this checkbox enabled and I get separate notifications/evaluations for each server I have.

Kind regards,

Joshua P.