21 Mar 2024 11:07 AM - last edited on 22 Mar 2024 07:55 AM by MaciejNeumann
Hi Team,
Dynatrace alerts were triggered for the few servers of a custom Device. Upon investigating we got to know that these alerts are false alerts, and the device is actually up and running during the issue time.
I just want to know the cause of these false alerts and how can we reduce the priority of these alerts from P2 to P4?
I am sharing you the problem notes of the false alerts raised for one of the server:
"ImpactedEntity":"VM availability Test on Custom Device 172.20.86.69", "Tags":"Application-SE_FMC, systemservice-firewall", "ProblemSeverity":"AVAILABILITY", "ProblemDetails":"OPEN Problem P-24021767 in environment REBUS-PRODUCTIONnProblem detected at: 02:24 (UTC) 29.02.2024nn1 impacted infrastructure componentnnCustom Devicen172.20.86.69nnVM availability Testn172.20.86.69 is in down state.
Thanks,
Bharathi
28 Mar 2024 03:44 PM
Issues like this are bet to be fully understood. The alert you stated is availability. What this out of the box availability or a custom metric? Reason being is that Custom metrics have a segment to alert on missing data.... which could cause the issue. Timing is also important in conjunction with networking. If you have a network anomaly where every 5 min, connectivity drops out for the OneAgent, the agent will continue to collect metric data because it indeed is running but communication out to the AG or Cluster is missing, then once the communication re-establishes, in the next 5 min - 1 hour, the agent will dump in all the metrics it collected while it couldn't communicate.
Network issues can be hard to track down to having the fundamental understanding of what alerted, the duration of missing data etc... will help you pin point why something was down, when it was actually up.
I once saw a security agent turning off the Oneagent which caused these availability alerts and we had to track them live to better understand what turned on just before the OneAgent reported connectivity issues, the duration etc.. and the repopulation of the data.