cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Memory Saturation Alerts

mrc15816
Advisor

We’re getting Memory saturation alerts on File and MS SQL servers. The Windows Admin and Performance monitoring teams say page faults are normal and consider the Dynatrace alerts false. We have many servers, which strains these teams. Any suggestions other than modifying the alerting profile or disabling Memory saturation configurations are appreciated.

Thanks

Raj

6 REPLIES 6

Hi @mrc15816 

Have you considered using Auto-adaptive thresholds for anomaly detection?

https://docs.dynatrace.com/docs/platform/davis-ai/anomaly-detection/auto-adaptive-threshold

 

Phani Devulapalli

@p_devulapalli thank you for your comment. I am not sure if auto-adaptive alerting support OOTB setup, we are managed deployment.

Fin_Ubels
Dynatrace Champion
Dynatrace Champion

Hey Raj,
If you use host groups to group these similar hosts together you could then modify the alerts to increase the thresholds across all those hosts in one easy configuration. If not then you could also modify the alerts on each of the hosts. Also, if the team is considering them false alarms, reach out and ask what they'd consider to be real problem and use that to influence any changes made to thresholds. If these are short alerts you could increase the time required over the threshold required, or if the team knows how many page faults they'd consider an issue you could increase that threshold.

Fin_Ubels_0-1727670615984.png

 

@Fin_Ubels We considered the host group option but are concerned about the complexity and operational challenges involved, especially with hundreds of host groups and multiple environments like cloud, on-premise, and co-location. We don’t immediately alert when we see memory saturation but wait to see if it resolves within 30 minutes. If the problem persists, we open an incident ticket. If the behavior is normal, we wonder if every customer is changing, or if Dynatrace OneAgent detect a server running MS SQL Server and adpopt?

Fin_Ubels
Dynatrace Champion
Dynatrace Champion

I don't believe the OneAgent adapts in these scenarios. From my experience customers often develop an onboarding strategy for new OneAgent deployments. When they are deployed, they get attached host groups, network zones, any custom tags/metadata required, custom alerting settings as required and any other settings required. This helps prevent scenarios such as this where going back is difficult and ensures that alerts generated are accepted by the teams they are relevant to. This doesn't really help in the current scenario but it would be good to consider in the future.

The auto adaptive thresholds that @p_devulapalli suggested would require you to create a custom alert and disable memory alerting on the hosts that the custom alert covered to ensure there weren't double ups on alerts. By the sounds of things this would be a considerable manual effort as well.

Thank you for your quick response @Fin_Ubels. I would agree with you that gathering much of the information will help, but at a large scale where many things come into play, it will make things complicated and operational very challenging.Custom metric events were not feasible as Dynatrace doesn’t support the use of two metrics, i.e., Memory Used & Page faults combination, hence we didn’t.My bad and I agree that OneAgent doesn’t do much on the alerting part, but the cluster should be capable of adopting vendor best practices, etc.

 

 

 

 

Featured Posts