cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Custom alert for physical hosts with low disk space

Elena_Masotta
Observer

I am trying to make it possible so that certain machines inform me when they are under 200GB disk space. The end goal is to alert if Available Disk drops below 200GB for all drives that have a capacity of 1TB or more and only for machines tagged as Physical.

I have tried using the following:

  1. Global Disk Usage Anomaly
  2. Host Disk Usage Anomaly
  3. Create Custom Event for Alerting
  4. Building query in Data Explorer

These options only allow me to enter a % (% that disk space drops below to be alerted on), where I need to enter "200GB". The custom event for alerting option allows me to enter 200GB, but I cannot split by host, only by disk, so this won't work. Can anyone help me with this?

21 REPLIES 21

paul_ingle
Dynatrace Helper
Dynatrace Helper

I believe the disk available metric "builtin:host.disk.avail" will alert on the sum of disks on a host, so if you set a threshold for 200GB using that metric as a custom event for alerting, a problem should be generated for each host (sum of disks) that crosses the threshold 

ChadTurner
DynaMight Legend
DynaMight Legend

You might need to incorporate a few different things. First I don't think we can leverage auto tags based off of drive size - RFE could be put in for that. What you could do is set a manual tag; "Drive Size:>1TB" I've also added a tag manually called physical for your use case but you can leverage this via auto tags. 

ChadTurner_1-1634930241548.png

 

Once you have tagged (manually) the Hosts that have a drive space larger then 1TB, you can then set a Management Zone. You can call it what ever you want, I just call it Drive Space > 1TB. 

ChadTurner_2-1634930439131.png

You don't need to have the extra condition of physical if you want to grab everything > 1TB, but if you want to isolate pHysical vs virtual and or cloud then you can add in that condition. 

 

This Management Zone can be, but isn't going to be used in the conventional sense. Its more so for the custom event for alerting. 

 

Navigate to the custom event for alerting and set it as the following: 

ChadTurner_3-1634930769283.png

Now feel free to adjust it as you see fit. the core aspect is to set the Management Zone as this reduces your scope to the hosts included in that MZ. You could also leverage the Tag as a rule based filter but its only one rule at a time so you cant have one with Physical and >1TB, hence the MZ method. 

 

I hope this helps! Granted it might be a big ask to tag each host one by one with the tag. You could always leverage auto tags and provide a condition as long as the host names contain Regex = <Values> but that's up to you. If its a handful of servers it might not be a problem for manual tagging

 

 

-Chad

Hi Chad

My request is simple. we have 150 hosts. we need disk space greater than 10%  to be alerted as slowdown event. Could you please provide step by step process here?

 

 

 

sure thing, Please feel free to add in any needed filters for drive letters etc... 

This is a custom metric that you can make via the Settings>Anomaly Detection>Metric Events:

 

ChadTurner_0-1713183987241.png

 

-Chad

Hi Chad - when i do the setting - alerts are getting for all servers even though sufficient diskspace. any reason. what needs to be corrected

Sounds like you have your over/under for the triggering of the alert swapped. Selected the other one and check the alert preview as shown in the screen shot.

-Chad

Sorry chad,. which one i need to select from above screenshot

My screen shot has Below set

-Chad

Sorry chad, i am not able to see any screenhot

Hi Chad

Please find the attached document which has settings. what we need his to send alerts of disk space is less than 10% of threshold. please review

 

as mentioned, you alert criteria is set to above. You need to change it to below. Right now your rule is sending alerts for any hosts that have more than 10% free space... so 12%, 20%, 30%, 80% free space are all alerting. If you selected Below, anything below 10% free space will alert... so 9%, 7%, 2%, 1%

-Chad

Thanks. can we CPU Usage% as metric key to get alerts if CPU Usage is above 90%?

correct, same methodology 

-Chad

Hi Chad - Below is the descritpion

 

Entity Host Name - {dims:dt.entity.host.name}
Disk - {dims:dt.entity.disk} - This is not displaying actual drive
Metric Name - {metricname}
Current value - {severity}
Alert Condition - {alert_condition}
Threshold - {threshold}

 

Disk - {dims:dt.entity.disk} - This is not displaying actual drive. How to display actaul drive - like C: and 😧

Hi Chad

Thanks for help, this issue resolve.  one more request. we want to set up disk alerts.

>85% to 89% - Minor incident - How do we do this.

>90 - Major incident.  - we have this already setup as per your recommedndation

 

same as how you set up the other, however you cannot set a range form X - Y, it will be from X and above. 

-Chad

Thanks. We already have alert less than 10,%. Could you please share screenshot how to do if it is less than 15%

Basically two alerts 

One when it is 15% available space

Second when it is 10% available space

 

monique_vanwall
Organizer

Currently, we are struggling with file system alerting.  There are 2 possibilities, Thresholds on % or Thresholds on values.
As we have different types of filesystems (smaller ones & bigger ones) it is no option to choose between these options.  A threshold of 85% on a 5Gb  filesystem is ok, but on a 1TB filesystem, it means that 150 Gb remains, which is not a good threshold.  
For the same reasons, a threshold on values is not valuable for both kinds of filesystems.

Does anyone have a solution for this situation?  We get the response, works as designed, but this design does not work for our environment.

 

-Monique-

You could allow Davis to baseline the disks from the settings and/or from a custom event for alerting. 

-Chad

I've merged the two 🙂 

Keep calm and build Community!

Featured Posts