I'm trying to figure out if this is possible for my customer. With the way our environment is set up, we don't necessarily need an alert if one agent/host goes down. What is worth an alert is if, for example, 3 agents go down in a specific agent group.
The way I attempted to accomplish this is by using the Connected Agents measure in my incident. I was then planning on using the sum aggregation on a 10 second evaluation with a lower severe threshold of 3 (was testing with 8 agents). The issue I found with this was that each individual agent will sum up its connection status and compare it to the threshold rather than sum up all agent's connection status and then compare it to the threshold. That sounds confusing so I made a table:
What I was hoping the process would be: Sum the agent connections (meaning, sum the column): Compare sum of column to 3, if equal or less than, alert.
What happens: Each agent individually compares its own sum (meaning sum each individual row): Compare sum of each row to 3, if equal or less than, alert.
This causes an alert to be triggered for each agent rather than summing up agents and alerting that way. Not sure if anyone has any suggestions or ways I can potentially accomplish this
1) Copy the measure
2) In the details tab of the copy uncheck "Create a measure for each agent"
2b) Also select to only evaluate for your desired agent group
3) Use the copy for your incident condition exactly as you described above
4) Buy me a beer
Follow up question - would there be a way to sum up agent connections to use in an incident for a subset of agents within an Agent Group? For example, the agent group has 50 agents, 25 for location A and 25 for location B. I want to alert if 5 agents in location A are offline (again, all in the same agent group). Another beer is on the line