Does anyone here have any experience regarding Dynatrace's behavior when applications/services are impacted due to any involved services reaching their container-defined CPU limit while being monitored by way of a Kubernetes application-only agent?
By "behavior" I mean: Is Dynatrace able to pin-point to the problem root cause correctly (i.e. CPU usage has reached CPU limit and is being throttled) under these circumstances? Does the agent know about the CPU limits at all?
In our case we are currently running Dynatrace Managed 1.162 and using application-only agents for deep-monitoring of Java processes running as Docker images on a PaaS (OpenShift 3.9.60).
Unfortunately it's currently not (easily) possible for us to create a synthetic test scenario where CPU limits are reached in order to see what happens.
Any input is greatly appreciated.
In general DT is collecting data about containers CPU like on screen I've pasted. So it should know what limit on CPU is set per container. We should have alert about that. But to be honest, I've never had before such issue on any environment I'm working with, so I can't tell you for sure 🙂
You're right there are 2 options of deployment. I've never tried application only approach. As I understand it should monitor only processes and services, not all containers. So it may be issue here.
From the lack of feedback I conclude not many Dynatrace customers are using application-only monitoring or anticipating issues due to container CPU limits and throttling...
Anyway, in the meantime I found an interesting article about how to deal with application pauses which are due to Linux cgroups-induced CPU throttling in Java: