cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

"Host CPU Unhealthy" incident detail

tameem2
Inactive

Hi All,

Our customer is getting "Host CPU Unhealthy" incident alerts and wants to know what caused them. The alert description states "During the last 15 minutes this host reported a high overall CPU usage of 99%, a load greater than 2 (unix based systems only) or more than 15% system CPU usage."

However, there is no indication of which one of those three conditions were met to trigger the alert. I have checked the Host Health dashboard for the time of the incident and CPU usage around 55%.

Can anyone please let me know how to find out which condition triggred the alert?  Also what what it's measured value?

Thanks,

Tameem

 

6 REPLIES 6

andreas_grabner
Dynatrace Leader
Dynatrace Leader

Hi. In this case it most likely is the "System Load" measure which we currently don't display in the host health dashboard. You can chart that measure in a custom chart. Its called "System Load". You will find it under System Monitoring -> Host Performance

tameem2
Inactive

 

Thanks Andy. Just to confirm, the system load has to be greater than 2% and remain that way for 15 mins to raise the alert?

 

Hi. It is not a % value. Check out pages like http://superuser.com/questions/23498/what-does-load-average-mean-in-unix-linux that explain the load value

tameem2
Inactive

Thanks Andy.

 

In case the "System Load" is measure that breaches the threshold; are you able to let me know the exact names of the other two measures ( refereed  as system CPU usage and overall CPU usage in the alert message)?

 

Cheers,

 

Tameem 

georg_schau2
Inactive

Hi.

All measures used for the Host Health Dashboard are in System Monitoring -> Host Performance -> ...
'CPU System Time' and 'CPU  Idle Time' are used to calculate the percentages over time, whereby we alert on CPU Idle Time <= 1%.

 

Regards,

Georg

tameem2
Inactive

Thanks Georg!