We have installed an agent in webserver. Then we have created an incident rule for finding busy thread count >30. Then we have restarted the server in which the agent has been installed inorder to reflect in dynatrace client. Then eventhough the busy thread count was not >30 the incident rule has logged an incident. Will incident gets logged on restart of server in which the agent is installed?
On the Incident configuration screen, I believe you are using the wrong aggregation.
The way the Incident is set up right now, if there are more than 30 Busy Threads in total during a 30-minute evaluation window, the Incident Condition is ruled true and the Incident is Triggered.
From your post, I believe you want to know if the average Busy Thread Counts are above 30. In this case, I would change the aggregation value to average. I might also change the evaluation window to be smaller, as the average condition would check if there was greater than or equal to 15 minutes of there being Busy Thread Counts above the value of 30. This means that in a triage scenario you would be on up to a 15 minute delay of there being an issue.
Let me know if this helps or if I can explain it better.
Thanks Joshua. I want to know the first scenario only i.e., "if there are more than 30 Busy Threads in total during a 30-minute evaluation window, the Incident Condition is ruled true and the Incident is Triggered". But at the particular time the incident got triggered the thread count was not >30. PFA for clarifiaction. @Joshua P.
AppMon calculates metrics every 10 seconds. By choosing the aggregation of "count" you are counting every instance of that metric being captured during the 30-minute evaluation window. If you have 10 busy threads active over a minute, the incident would fire after 30 seconds (10 + 10 + 10 >= 30 so Incident will trigger).
Actually our team wants to know if the busy thread count is greater than 30. So this is the only purpose. But with the incident rule we have created it is logging an incident if the thread count is 10 also. We do not want this type of incident to be triggered and need if the actual busy thread count is >30. @Joshua P.
If you want to know if at any time the busy thread count is greater than 30, I would do an aggregation of maximum or average, and set the evaluation timeframe to something small like 10 seconds. In theory this could also be achieved with Count at the 10s aggregation window.
What we do at my company is that we are only concerned if the busy thread counts are above 300 for an evaluation time period of 15 minutes.
So we have chosen aggregation as average, evaluation timeframe as 15, and the threshold for the measure is set to Upper Severe 300.
Thanks Joshua. But for the aggregation count also should similarly work like average aggregation right?
One more doubt. In our case we have selected the agent group containing two servers. Is it adding both busy threads count and the incident is getting logged?
Count is different than Average.
Count adds up all of the instances where the measure records data.
Average adds up all of the instances where the measure records data, then divides by the number of times it measured.
Yes count adds up all instances but you have explained some scenario like
"AppMon calculates metrics every 10 seconds. By choosing the aggregation of "count" you are counting every instance of that metric being captured during the 30-minute evaluation window. If you have 10 busy threads active over a minute, the incident would fire after 30 seconds (10 + 10 + 10 >= 30 so Incident will trigger)."
will this not happen when aggregate is average? @Joshua P.
Average won't have this problem because it divides by how often it takes the measurement. This leaves us with the value as shown as in the PowerPoint I linked.
The PowerPoint explains the average aggregation very well if you view it in Presentation Mode.
If you have the checkbox in the Edit Measure > Details tab for "Create a Measure for Each Agent" enabled, then it will be separate and not summed together as one value.
My threading incident for my Apache Web Servers has this checkbox enabled and I get separate notifications/evaluations for each server I have.