Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold

JordanGreen — Fri, 22 Aug 2025 12:38:04 GMT

On a few of our hosts the CPU usuage spiked significantly and stayed at that not normal level for few days as per below. I just wondered Is there a way to detect a sudden increase in CPU above our normal baseline. We didnt get an alert as it didnt hit our 95% threshold but we would like to be notified in sudden significant change of CPU say an increase of 20/30% for a period of time. Just wondered does any recommendation on best way to do this?

Re: Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold

Mizső — Sat, 23 Aug 2025 21:37:37 GMT

Hi @JordanGreen

Auto-adaptive thresholds for anomaly detection — Dynatrace Docs

In Managed you can play with auto-adaptive baseline (+signal fluctuation and duaration) metric event at Anomaly detection with this metric expression:

builtin:host.cpu.usage:splitBy("dt.entity.host").

In Saas you can use Davis Anomaly Detection from a Notebook (also play with fluctuation and duration):

Based on my experience these CPU patterns related only one process: eg. antivirus, compression or java process gc suspension...so you can try monitor process cpu usage with parents transformation (for host infromation):

Managed metric expression:

builtin:tech.generic.cpu.usage:parents:splitBy("dt.entity.process_group_instance","dt.entity.host")

SaaS DQL:

timeseries usage = avg(dt.process.cpu.usage), by: { dt.entity.process_group_instance, dt.entity.host }
| fieldsAdd entityName(dt.entity.process_group_instance), entityName(dt.entity.host)

I would have another idea for metric expression and DQL, you can try this also.

Metric expression:

(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m))-(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m):timeshift(-1h))

DQL:

timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc
| join [ timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }, shift: -1h
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc ], on: { dt.entity.host }, fields: { operand = usage }
| fieldsAdd expression = usage[] - operand[]
| fieldsRemove usage, operand
| fieldsAdd entityName(dt.entity.host)

Long positive "hills" can be a good trigger of problem creation.

I hope it helps.

Best regards,

János

topic Re: Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold in Alerting

Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold

Re: Advice on alerting if CPU suddenly significantly goes and stays above normal baseline but under alerting threshold