cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to detect plummet / spike in metric like disk free % before they are a problem?

brandon_camp
Frequent Guest

What is the recommended approach to detect a sudden surge in a metric like disk free? 

Some context:

  • My Prod environment has 60k Disks so automation is required
  • I am using a Workflow with Davis Forecast Task to predict future disk space exhaustion.  However the Workflow runs once a day and will not detect sudden surges in disk usage spikes that occur in between runs of the Forecast. 
  • These surges in disk usage can cause disk free % to plummet in 1-2 hours but aren't yet low enough to register a Problem in DT.  e.g. 80% free to 20% free in an hour (see graph below)

brandon_camp_0-1748528447800.png

Assuming our Workflow Forecast will catch slow disk growth over time, how can we detect sudden dips in disk free %?  Was considering creating a calculated metric for rate of change but thought that was too heavyweight.  Also was not sure if Anomaly Detection was appropriate for this use case.  Any thoughts? 

Thanks!

1 REPLY 1

StrangerThing
DynaMight Advisor
DynaMight Advisor

I think anomaly detection through either a metric event or Davis Anomaly Detection would solve your issue here. You can set an auto-baselined threshold to be "outside" the bounds, which would alert whether the data points dip or surge. You can tweak the sensitivity of the auto-adaptive threshold as well to suit your needs. We use this across all of our servers, with different thresholds for OS drive versus other drives.

Observability Engineer at FreedomPay

Featured Posts