Solved: Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold

JordanGreen · ‎22 Aug 2025

Hi

On a few of our hosts the CPU usuage spiked significantly and stayed at that not normal level for few days as per below. I just wondered Is there a way to detect a sudden increase in CPU above our normal baseline. We didnt get an alert as it didnt hit our 95% threshold but we would like to be notified in sudden significant change of CPU say an increase of 20/30% for a period of time. Just wondered does any recommendation on best way to do this?

Mizső · ‎23 Aug 2025

Hi @JordanGreen

Auto-adaptive thresholds for anomaly detection — Dynatrace Docs

In Managed you can play with auto-adaptive baseline (+signal fluctuation and duaration) metric event at Anomaly detection with this metric expression:

builtin:host.cpu.usage:splitBy("dt.entity.host").

In Saas you can use Davis Anomaly Detection from a Notebook (also play with fluctuation and duration):

Based on my experience these CPU patterns related only one process: eg. antivirus, compression or java process gc suspension...so you can try monitor process cpu usage with parents transformation (for host infromation):

Managed metric expression:

builtin:tech.generic.cpu.usage:parents:splitBy("dt.entity.process_group_instance","dt.entity.host")

SaaS DQL:

timeseries usage = avg(dt.process.cpu.usage), by: { dt.entity.process_group_instance, dt.entity.host }
| fieldsAdd entityName(dt.entity.process_group_instance), entityName(dt.entity.host)

I would have another idea for metric expression and DQL, you can try this also.

Metric expression:

(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m))-(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m):timeshift(-1h))

DQL:

timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc
| join [ timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }, shift: -1h
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc ], on: { dt.entity.host }, fields: { operand = usage }
| fieldsAdd expression = usage[] - operand[]
| fieldsRemove usage, operand
| fieldsAdd entityName(dt.entity.host)

Long positive "hills" can be a good trigger of problem creation.

I hope it helps.

Best regards,

János

Dynatrace Community RockStar 2024, Certified Dynatrace Professional