15 Jun 202208:12 AM - edited 24 Jun 202202:07 AM
The second version of Linux kernel Control Groups (cgroups v2) was introduced in 2015 to improve and simplify how processes are organized and configured. Control groups v2 become mainstream and various Linux distributions switched to cgroups v2 already, eg. Fedora 31, Arch Linux 2021.04.01, OpenSUSE Tumbleweed, and Debian 11. Kubernetes distributions based on a cgroups v2 based OS are appearing more frequently. Google Kubernetes Engine (GKE) may be the first cloud-based Kubernetes distribution to switch as of version 1.25.
Dynatrace is moving too
Dynatrace OneAgent collects metrics, traces, and logs from a variety of sources including cgroups, which provides data about container CPU, memory, block I/O usage. OneAgent version 1.243 will support both cgroups v1 and v2.
Bumps in the road
The tech industry is wary of breaking API changes and cgroups v2 is no exception. In the Java community, for example, the JDK did not detect memory or CPU limits properly until it was patched in JDK 15. Another important example, in the Kubernetes community, is how memory, process, and cpu controllers, in crun, require privileged pods to access the cgroups linux namespace.
Important: what this means for you
Customers upgrading to a cgroups v2 based distribution of their Kubernetes environment, such as GKE 1.24, must consider three temporary issues that prevent diagnosing some container related problems in Dynatrace. If you're waiting to migrate to cgroups v2 each issue will be resolved, depending on your timeframe.
Kubernetes customers using the Dynatrace Operator must add a feature flag to their Dynakube manifest to make OneAgent a privileged container. This flag is available in the Dynatrace Operator v0.7.0 and is a temporary workaround. This example gist shows how it works. Our roadmap forecasts the temporary feature flag will be unnecessary in OneAgent v1.247+ coming in August.
Kubernetes OOM Kill events won't be available in Dynatrace for cgroups v2 based Kubernetes clusters. We plan to re-introduce them later this year.
Customers using containerized ActiveGates within Kubernetes clusters with cgroups v2 enabled must set internal memory limits using an environment variable. This example shows a Dynakube configured to set internal memory limits to 4GB. This is due to JDK incompatibilities with cgroups v2 as mentioned earlier.
Setting an internal memory limit prevents ActiveGates from being OOM killed by Kubernetes. Before cgroups v2, the internal limit was set according to the memory limit configured in the Dynakube. Our roadmap forecasts the environment variable will be unnecessary in ActiveGate v1.245 coming in July.
We're here to help
As always reach out if you have any questions.
Kubernetes beatings will continue until morale improves