on 25 Apr 2024 08:59 AM - edited on 25 Apr 2024 09:54 AM by Adam-Piotrowicz
When looking at the memory chart of my node, I noticed that the memory usage is bigger than the memory limits. How can this happen?
Kubernetes app | Kubernetes classic |
The reason for this is that every container has a usage, but not every container has to have a limit set. On node screens, memory limits/request/usage are calculated by summing up the respective value from the application containers of a pod. Furthermore, it is not required to set any requests/limit for a container. Therefore, when summing up usage and limits on node scope, it can happen that you get memory usage for all application containers, but limits are only set for a few application containers.
Let us inspect this scenario with an example:
Let us assume, Pod1, Pod2 and Pod3 are running on our node. Each pod has one application container and the following table shows the memory limits and memory usage defined for each container.
Pod | Memory limits | Memory usage |
Pod1 | 1GiB | 500MiB |
Pod2 | - | 1GiB |
Pod3 | 500MiB | 250MiB |
Now if we sum up this value to the node scope, we end up with the following total values for memory limits and usage:
Memory limits total = 1 GiB + 0 + 500 MiB = 1.5 GiB
Memory usage total = 500 MiB + 1 GiB + 250 MiB = 1,75 GiB
So we can easily see how this scenario can happen. As due to that for the container of Pod2, no limit was defined, the sum of memory usage can be bigger than the limits.
Now, to verify this behavior on a real node, this can be easily achieved by using a DQL query (Note: in order to successfully execute this query, the new Kubernetes experience is required).
fetch dt.entity.container_group_instance
| fields container.name = entity.name, container.id = id, cluster.id = belongs_to[dt.entity.kubernetes_cluster], node.name = nodeName
| filter cluster.id in [fetch dt.entity.kubernetes_cluster | filter entity.name == "{CLUSTER_NAME}" | fields id]
and matchesPhrase(node.name, "{NODE_NAME}")
| lookup [timeseries {
limits_memory=sum(dt.kubernetes.container.limits_memory)}, by:{k8s.container.name, dt.entity.container_group_instance}
| fields k8s.container.name, limits_memory=arrayLast(limits_memory), dt.entity.container_group_instance]
, sourceField:container.id, lookupField: dt.entity.container_group_instance, fields:{limits_memory}
| lookup [timeseries {
memory_usage=sum(dt.kubernetes.container.memory_working_set)}, by:{k8s.container.name, dt.entity.container_group_instance}
| fields k8s.container.name, memory_usage = arrayLast(memory_usage), dt.entity.container_group_instance]
, sourceField:container.id, lookupField: dt.entity.container_group_instance, fields:{memory_usage}
| fields container.name, node.name, limits_memory, memory_usage
| sort limits_memory
Replace {CLUSTER_NAME} and {NODE_NAME} with the name of your Kubernetes cluster and the node for which you want to execute this query.
Here we see that for our node "ip-10-0-130-92.ec2.internal", only four containers have even set memory limits.
By adding summarize to the end of the query, we can also see the values which are displayed in the node screen.
fetch dt.entiy.container_group_instance
....
| sort limits_memory
| summarize by:{}, {limits_memory=sum(limits_memory), memory_usage=sum(memory_usage)}