cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Best way to handle dynamic environments for infrastructure monitoring and alerts

andrew_patterso
Advisor

Hi all,

I am working on an AppMon environment where if there's a problem with the application or host, it will be killed and then rebuilt. This is fine from an instrumentation perspective, where the agents map and start collecting data successfully.

When this happens though, if the issue caused the process to die, this raises an agent disconnected (unexpected) alert. The problem is that when the agent comes back up, it looks like it's on a different host. Especially during testing where servers are frequently rebuilt, this causes the infrastructure overview to fill up with a large number of offline servers, as well as causing alerts that never close (because the agent, even though it came back up, looks like it's on a different server, thus a different agent).

Is there a good way to handle this situation? I'd prefer not to disable the alert, because it's still valuable to know when processes have had an issue and needed to be rebuilt. I suspect the offline servers will disappear after 72 hours - is there any way to reduce this time, maybe to 24 hours?

Thanks,

Andrew

2 REPLIES 2

waikeat_chan
Mentor

Maybe you can go to setting, then server, then infrastructure, then select the hostgroup and double click.

There, then you can adjust the 72 hours limit

andrew_patterso
Advisor

Ahh good call Wai. I didn't know about that option.

I will watch and see if that resolves the open alert issue as well.