Exclude certain large disks from creating Problems

chs-obs-1 · ‎22 Feb 2024

Hi fellow Dynatracers, please read the following problem and the solution we found. We would like to know if there is a better way to solve it or if there are any holes in our solution.

Problem:

We want to exclude certain large disks from creating Problems and Incidents flowing into our Service Now Integration.

Our Solution thus far after looking through Dynatrace Documentation:

We have the following idea for solving our problem. We would like to know if there is a better way to solve it. We are using Python and Dynatrace API to automate this solution so it can be reapplied at scale, with ease and consistency. We’ll do some manual testing before scripting it.

To create a custom metric based on the builtin:host.disk.free metric in Dynatrace and exclude specific disks, we’ll follow these steps:

Tagging Hosts:

First, manually tag the hosts that have disks with a capacity of 1TB or more. We’ll create a tag like "Drive Size:>1TB" for this purpose.

Tagging Disks (Automatic Tags with Entity Selector Rules):

We’ll create an automatic tag using an Entity Selector rule:
Download the list of entities (which includes drives) via the Dynatrace API.
Next, explore the available drives and their details.
Use the Entity Selector rule to assign a tag based on specific criteria. For example, if a drive is /opt, tag it with tag1=value1; if it's /opt/wl, tag it with tag1=value2 etc.We’ll also pay special attention to to size and tag accordingly.

Create a Management Zone:

We’ll Set up a Management Zone (MZ) named "Drive Space > 1TB" (or any other suitable name). Then we’ll include the tagged hosts and/or drives from step 1 in this MZ.
The MZ will be used for scoping the custom event(s) for alerting.The whole solution depends on this part working well.

Custom Event for Alerting:

Under custom event management for alerting settings we’ll configure the events per the following:
Condition for Warning: Set the threshold to 85GB using the builtin:host.disk.avail
Condition for Critical: Set the threshold to 53GB using the builtin:host.disk.avail
Scope: Use the MZ created in step 2. This ensures that the alert applies only to hosts and/or disks within that MZ.
Adjust other settings as needed.