cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Incident evaluation strange behavior?

Hello,

I have created a measure on Functional health - agents availability. Threshold is setup up lower severe =0. Creating measure for each agent and calculating only for selected group of agents (there are only 2 agents). Then I created incident on severe level, expecting if one of the agents goes down it would fire the incident. Unfortunately nothing is triggered. Did I configure everything correctly? My aim is to fire incident if any of agents per group goes down. Thanks

4 REPLIES 4

JamesKitson
Dynatrace Leader
Dynatrace Leader

I have done one series of such alerts before so here is what I would recommend that worked for me, I'm not sure if each of these is required.

  • I used the Connected Agents (availability) measure instead of Agents Availability - at the time this seemed to be better suited for the behaviour we wanted for the incident
  • In the details I unchecked "create a measure per agent" and instead only checked the box to only create the measure for one agent group
  • Now with a single measure tracking the agent count in a single agent group create the incident so that is only relevant for one agent group
  • Repeat for the number of agent groups you have.

The above will leave you with one incident per agent group.

Whenever you aren't getting the results you expect for an incident it always help to chart out the measures the incident is based on so you can visualize what the incident is actually evaluating. Basically each individual measure that will be evaluated separately with the potential to create it's own incident.

James

Hi! Thanks for the suggestion. That works, I can confirm it cause I did
it before. Now the thing is if I uncheck create measure per agent then I
will not see in percent graph each agent but just a group with numbers.
I have 7 groups and some of them has 6 agents and more while each agent
is important. I do not know why this does not work. The current
solution would be as you suggested to create another bunch of measures
per group, but then we catch more data 😞

I would not worry about the approach of creating new measures. If the new measures are being populated from data that's already collected by the agent, then there's no overhead cost to the additional measures. As long as it's not an absurd number of additional measures the load on the AppMon Server is likely negligible. You can monitor the AppMon Self Monitoring measures to see number of measures is significantly increased, also monitor the volume of data being stored in the PW, etc.

Ok. Thank you 🙂