cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
PeVa
Dynatrace Contributor
Dynatrace Contributor

 

Introduction

This is a troubleshooting guide for using Kubernetes CPU throttling data in Dynatrace. It is intended for customers who have questions or problems when using CPU throttling data.

The guide provides answers to the questions

  • Why is my container CPU throttled?
  • Why is my container CPU throttled although its limit is set high enough?
  • Why is my container CPU throttled although its limit is set high enough and node has enough allocatable CPUs?

The guide begins with general considerations about CPU throttling, continues with an explanation of how Dynatrace processes and displays CPU throttling data, and finally addresses specific scenarios and questions when using CPU throttling data.

 

General Considerations

 

CPU Throttling

A container is considered as CPU throttled if it requires more CPU resources than it is granted. At a more technical level, a container can be considered CPU throttled if it is interrupted during a certain scheduling period even though it is still capable of running.

 

An Example

Suppose you have a container in an isolated environment with no CPU limits. Assume further that the process in the container has the following CPU usage behavior over time.

example-throttling-1.png

Each small box represents the time course in 10ms. If the box is green, the process in the container is running and needs CPU; if the box is gray, the process in the container is waiting for IO (storage, network, user, whatever) and therefore does not need CPU. So there are phases in which the container needs CPU (green) interrupted by phases in which it does not need CPU (gray).

Now assume that the same container is running in a real-world environment and a CPU limit of 400 millicores has been set. A CPU limit of 400 millicores means that the process in the container is not allowed to use more than 400 millicores. With this CPU limit the following running behavior would result.

example-throttling-2.png

Every small red box here means that the process in the container could actually be run (see first diagram), but was throttled (interrupted) due to the limit. Throttling is generally enforced in individual 100ms scheduling periods.

This is what throttling means. A running container is interrupted due to resource limits. This extends its original runtime.

 

Reasons and Effects

CPU throttling can happen for a number of reasons, e.g.

  • the container has a CPU limit set and the container has reached that limit
  • the Kubernetes node does not have enough free CPU resources to let all containers run at their full limits

In a Kubernetes cluster in which a number of pods are supposed to make the best possible use of the overall available Kubernetes resources, a certain amount of throttling is normal and can be considered as usual. In order to make the best use of cluster resources, it can make sense to run a low-priority batch job, where response time is not important, with higher throttling.

Excessive throttling can become problematic when a process has to deliver short response times. In this case, care should be taken to ensure that the CPU throttling is not too high.

General advice: CPU throttling occurs if not enough resources are available, at the same time, one has to be careful of not over-provisioning workloads and end up wasting huge amounts of resources. For further information see 'Optimize resource utilization of Kubernetes clusters with SLOs'.

 

CPU Throttling in Dynatrace

There is a difference regarding detail level of CPU usage and throttling data between Dynatrace Classic and the new Dynatrace platform. In Dynatrace Classic, CPU usage and throttling data is only available at workload level. On the new Dynatrace platform, this data is available at workload, pod and container level.

 

Native Throttling Metrics provided by Kubernetes

Kubernetes provides two different throttling metrics over its Prometheus cAdvisor metrics.

Metric Kubernetes cAdvisor Metric Key Description
throttled_periods_total container_cpu_cfs_throttled_periods_total Measures the CPU throttling in periods. This value is increased by one in each scheduling period it is actually throttled.
throttled_seconds_total container_cpu_cfs_throttled_seconds_total Measures the CPU throttling in milliseconds. This value is increased by the actual throttled milliseconds.

 

Throttling Related Metrics provided by Dynatrace

Dynatrace provides the following Kubernetes CPU metrics. In order to make all these metrics easy to combine and compare, Dynatrace stores them with the unit 'core' / 'millicore'.

Metric Dynatrace Classic Metric Key Dynatrace Platform Metric Key Description
cpu_usage builtin:kubernetes.workload.cpu_usage dt.kubernetes.container.cpu_usage Measure the total CPU consumed (user usage + system usage) by container in millicores.
cpu_throttled builtin:kubernetes.workload.cpu_throttled dt.kubernetes.container.cpu_throttled Measure the total CPU throttling by container in millicores. This metric is based on the throttled_seconds_total metric mentioned above.
requests_cpu builtin:kubernetes.workload.requests_cpu dt.kubernetes.container.requests_cpu Measure the CPU requests of a container in millicores.
limits_cpu builtin:kubernetes.workload.limits_cpu dt.kubernetes.container.limits_cpu Measure the CPU limits of a container in millicores.

 

CPU Throttling in the UI

In the Dynatrace Kubernetes Classic UI, CPU throttling can be analyzed on workload level in the 'Resources analysis' section of the details screen.

classic-ui-cpu-throttling.png

In the Dynatrace Kubernetes App, CPU throttling can be analyzed on workload, pod or container level in the 'Utilization' section of the details screen.

app-cpu-throttling.png

 

Troubleshooting

 

Why is my container CPU throttled?

If a container that requires fast request/response times is experiencing excessive CPU throttling, the first thing to check is whether the container CPU usage is close to the container CPU limit.

If you are already using the new Dynatrace platform, this data is available on container level. Otherwise, if you are still using Dynatrace Classic, this data is only available on workload level. In this case, it is difficult to find out exactly which container is affected.

If possible, always try to break down the throttling analysis to the container level. A throttling analysis exclusively at the workload level does not reveal the problematic container.

cpu-usage-vs-limit.png

If the CPU usage is close to the limit, then the container limit should be increased. For more information see 'Resource Management for Pods and Containers'. If the CPU usage is not close to the limit, see next points for further reasons.

 

Why is my container CPU throttled although its limit is set high enough?

If the limit of a container is significantly higher than its usage but the container CPU throttling is high, this may be because the node has too little allocatable CPU resources for the number of pods running on it. This information can be found in the Dynatrace UI in the node details checking the usage / allocatable / limits metrics. This data is available on Dynatrace Classic as well as on the new Dynatrace platform.

node-cpu-utilization.png

If the CPU usage is close to the allocatable CPUs, you have the following options.

Recommended measures:

  • Increase your own container’s CPU requests to get more guaranteed CPU resources.
  • Increase your own container’s CPU limits or completely remove it.
  • Check other containers and adapt their CPU requests and limits in a fair way. Do really all of them need high requests and limits?

Further measures:

  • Increase the allocatable CPUs on this node
  • Add more nodes to your cluster

 

Why is my container CPU throttled although its limit is set high enough and node has enough allocatable CPUs?

Even if the container limit is apparently set high enough and the node has enough allocatable CPU resources, throttling can still occur. The reason for this may lie in a technical detail of the operating system. At the end it is a matter of scale (or a matter of metric timeframe vs. scheduling period).

The Dynatrace CPU usage metric is a value averaged over one minute whereas the actual CPU throttling is enforced in 100ms periods. The CPU usage metric available in Dynatrace is determined once per minute and represents the average usage in that minute. The CPU throttling itself is enforced by the operating system and generally works with a 100ms scheduling period.

Consider the throttling example from above.

example-throttling-2.png

Although throttling occurs in some of the scheduling periods, the usage is smaller than the limit over the entire period on average. However, the throttling is not decided on the average, but in the small scheduling periods. Even if the average CPU usage of a container is below the CPU limit over a whole minute, it is still possible that the usage would exceed the limit in one of the many small (100ms) throttling periods and the container would therefore be throttled in these periods.

This can for example happen, when the container has short CPU bursts. So high CPU throttling while limits are not reached may indicate short high CPU bursts. In this case, you could further increase the CPU limit of the container (or remove it completely) so that the limit is not exceeded so easily within individual scheduling periods. In addition, increasing the CPU requests in a healthy way can also be helpful.

 

Further Support

If you have any further questions or encounter any issues not listed above, please feel free to contact our support team.

Version history
Last update:
‎30 Jul 2024 01:35 PM
Updated by: