Symptom/Question

Alisopp · ‎25 Apr 2024

Symptom/Question

When looking at the memory chart of my node, I noticed that the memory usage is bigger than the memory limits. How can this happen?

Kubernetes app	Kubernetes classic

Answer

The reason for this is that every container has a usage, but not every container has to have a limit set. On node screens, memory limits/request/usage are calculated by summing up the respective value from the application containers of a pod. Furthermore, it is not required to set any requests/limit for a container. Therefore, when summing up usage and limits on node scope, it can happen that you get memory usage for all application containers, but limits are only set for a few application containers.

Let us inspect this scenario with an example:

Let us assume, Pod1, Pod2 and Pod3 are running on our node. Each pod has one application container and the following table shows the memory limits and memory usage defined for each container.

Pod	Memory limits	Memory usage
Pod1	1GiB	500MiB
Pod2	-	1GiB
Pod3	500MiB	250MiB

Now if we sum up this value to the node scope, we end up with the following total values for memory limits and usage:

Memory limits total = 1 GiB + 0 + 500 MiB = 1.5 GiB

Memory usage total = 500 MiB + 1 GiB + 250 MiB = 1,75 GiB

So we can easily see how this scenario can happen. As due to that for the container of Pod2, no limit was defined, the sum of memory usage can be bigger than the limits.

Now, to verify this behavior on a real node, this can be easily achieved by using a DQL query (Note: in order to successfully execute this query, the new Kubernetes experience is required).

fetch dt.entity.container_group_instance 
| fields container.name = entity.name, container.id = id, cluster.id = belongs_to[dt.entity.kubernetes_cluster], node.name = nodeName
| filter cluster.id in [fetch dt.entity.kubernetes_cluster | filter entity.name == "{CLUSTER_NAME}" | fields id] 
and matchesPhrase(node.name, "{NODE_NAME}")
| lookup [timeseries {
limits_memory=sum(dt.kubernetes.container.limits_memory)}, by:{k8s.container.name, dt.entity.container_group_instance}
| fields k8s.container.name, limits_memory=arrayLast(limits_memory), dt.entity.container_group_instance]
, sourceField:container.id, lookupField: dt.entity.container_group_instance, fields:{limits_memory}
| lookup [timeseries {
memory_usage=sum(dt.kubernetes.container.memory_working_set)}, by:{k8s.container.name, dt.entity.container_group_instance}
| fields k8s.container.name, memory_usage = arrayLast(memory_usage), dt.entity.container_group_instance]
, sourceField:container.id, lookupField: dt.entity.container_group_instance, fields:{memory_usage}
| fields container.name, node.name, limits_memory, memory_usage
| sort limits_memory

Replace {CLUSTER_NAME} and {NODE_NAME} with the name of your Kubernetes cluster and the node for which you want to execute this query.

Here we see that for our node "ip-10-0-130-92.ec2.internal", only four containers have even set memory limits.

By adding summarize to the end of the query, we can also see the values which are displayed in the node screen.

fetch dt.entiy.container_group_instance
....
| sort limits_memory
| summarize by:{}, {limits_memory=sum(limits_memory), memory_usage=sum(memory_usage)}

[Kubernetes] Node memory usage is greater than memory limits

Symptom/Question

Answer