Solved: Set up Disk Corrupt and Disk Available alerting on host group level

PrateekGupta · ‎08 May 2024

Urgent:

How to setup Disk Corrupt and Disk Available alerting on host group level ?

PierreGutierrez · ‎09 May 2024

Disk Corrupt : There are several factors that can identify a corrupted disk, for example, read errors, write errors, slow operation.
For this you have useful metrics such as:

- Disk read time (builtin:host.disk.readTime )
- Disk read operations per second (builtin:host.disk.readOps )
- Disk read bytes per second (builtin:host.disk.bytesRead )
- Disk throughput read (builtin:host.disk.throughput.read)
- Disk write time (builtin:host.disk.writeTime)
- Disk write operations per second (builtin:host.disk.writeOps)
- Disk write bytes per second (builtin:host.disk.bytesWritten)
- Disk throughput write (builtin:host.disk.throughput.write)

Disk Available: I suggest measuring these metrics in Percentage
For this you have useful metrics such as:

- Disk available % (builtin:host.disk.free)
- Inodes available % (builtin:host.disk.inodesAvail)

Spoiler

*The marked metrics are the most used for this type of scenarios, however it depends on the need.

To apply these rules at the host group level:
1.- I think you can first create the rule from Data Explorer.

2.- Create a Metric Event (Using the code of Advanced mode)

I hope it's helpful 💪

Pierre Gutierrez - LATAM ACE Consultant - Loving Cats! Loving Technology !