22 Aug 2025 01:38 PM
Hi
On a few of our hosts the CPU usuage spiked significantly and stayed at that not normal level for few days as per below. I just wondered Is there a way to detect a sudden increase in CPU above our normal baseline. We didnt get an alert as it didnt hit our 95% threshold but we would like to be notified in sudden significant change of CPU say an increase of 20/30% for a period of time. Just wondered does any recommendation on best way to do this?
Solved! Go to Solution.
23 Aug 2025 10:36 PM - edited 23 Aug 2025 10:37 PM
Hi @JordanGreen
Auto-adaptive thresholds for anomaly detection — Dynatrace Docs
In Managed you can play with auto-adaptive baseline (+signal fluctuation and duaration) metric event at Anomaly detection with this metric expression:
builtin:host.cpu.usage:splitBy("dt.entity.host").
In Saas you can use Davis Anomaly Detection from a Notebook (also play with fluctuation and duration):
Based on my experience these CPU patterns related only one process: eg. antivirus, compression or java process gc suspension...so you can try monitor process cpu usage with parents transformation (for host infromation):
Managed metric expression:
builtin:tech.generic.cpu.usage:parents:splitBy("dt.entity.process_group_instance","dt.entity.host")
SaaS DQL:
timeseries usage = avg(dt.process.cpu.usage), by: { dt.entity.process_group_instance, dt.entity.host }
| fieldsAdd entityName(dt.entity.process_group_instance), entityName(dt.entity.host)
I would have another idea for metric expression and DQL, you can try this also.
Metric expression:
(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m))-(builtin:host.cpu.usage:splitBy("dt.entity.host"):avg:sort(value(auto,descending)):rollup(avg,15m):timeshift(-1h))
DQL:
timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc
| join [ timeseries usage = avg(dt.host.cpu.usage), by: { dt.entity.host }, shift: -1h
| fieldsAdd usage = arrayMovingAvg(usage, 15)
| sort arraySum(usage) desc ], on: { dt.entity.host }, fields: { operand = usage }
| fieldsAdd expression = usage[] - operand[]
| fieldsRemove usage, operand
| fieldsAdd entityName(dt.entity.host)
Long positive "hills" can be a good trigger of problem creation.
I hope it helps.
Best regards,
János