Solved: Creating a dashboard with a formula "percentage of Kubernetes pod usage (requests/limits)"

shakib · ‎22 Feb 2023

So I am able to see a % of the CPU being requested per pods compared to the limit of the CPU on said pod using the following formual:

(
builtin:cloud.kubernetes.pod.cpuRequests:last:splitBy("dt.entity.cloud_application_instance")
/ builtin:cloud.kubernetes.pod.cpuLimits:last:splitBy("dt.entity.cloud_application_instance")
* 100
):setUnit(Percent):sort(value(sum,descending))

But I am trying to add a filter to this by saying that I only want the data with a specific K8 cluster. When I create a dashboard tile for the above formula and then I add a dynamic filter on the dashboard to limit by a specific Kubernete Cluster, that dynamic filter doesn't work.

So, I am now trying to enter a Kubernete cluster filter into the formula but I'm running into issues. Any suggestions on how I can achieve this? I tried adding things like "dt.entity.kuberentes_cluster" at the end of both metrics like below, but it does not work.

(
builtin:cloud.kubernetes.pod.cpuRequests:last:splitBy("dt.entity.cloud_application_instance", "dt.entity.kubernetes_cluster")
/ builtin:cloud.kubernetes.pod.cpuLimits:last:splitBy("dt.entity.cloud_application_instance","dt.entity.kubernetes_cluster")
* 100
):setUnit(Percent):sort(value(sum,descending))

shakib · ‎22 Feb 2023

So in this example below I see a % of the CPU requested from the allocated on a Node. I want this same data at the pod level, which is what I am trying to find out in my formula above.

(
builtin:kubernetes.node.requests_cpu:last:splitBy("dt.entity.kubernetes_node","dt.entity.kubernetes_cluster"):sum
/ builtin:kubernetes.node.cpu_allocatable:last:splitBy("dt.entity.kubernetes_node","dt.entity.kubernetes_cluster"):sum
* 100
):setUnit(Percent):sort(value(sum,descending))

shakib · ‎22 Feb 2023

Ok I figured out how to get it to work with an environment tag filter. But the problems Dynatrace is reporting to me are different than the ones I am seeing from my formula which now looks like:

(builtin:cloud.kubernetes.pod.cpuRequests:filter(and(or(in("dt.entity.cloud_application_instance",entitySelector("type(cloud_application_instance),fromRelationship.runsOn(type(KUBERNETES_NODE),tag(~"Environment:TagHERE~"))"))))):splitBy("dt.entity.cloud_application_instance") / builtin:cloud.kubernetes.pod.cpuLimits:filter(and(or(in("dt.entity.cloud_application_instance",entitySelector("type(cloud_application_instance),fromRelationship.runsOn(type(KUBERNETES_NODE),tag(~"Environment:TagHERE~"))"))))):splitBy("dt.entity.cloud_application_instance") * 100 ) :sort(value(auto,descending)):setUnit(Percent)

I guess this means that the CPU request saturation alerts that Dynatrace is showing me (which are worthless as they point to a Node and not an actual Pod that I can point to and say this is the problem one) are using some other statistic.

florian_g · ‎05 Jun 2023

Hi 👋,

Here are a few thoughts from my end - I hope they help 🙂

The metrics you're using in your expression are meanwhile deprecated. I recommend using the following alternative:
(
builtin:kubernetes.workload.requests_cpu:last:splitBy("dt.entity.cloud_application", "dt.entity.kubernetes_cluster")
/ builtin:kubernetes.workload.limits_cpu:last:splitBy("dt.entity.cloud_application","dt.entity.kubernetes_cluster")
* 100
):setUnit(Percent):sort(value(sum,descending))

With that expression, also the filtering works.
I'm interested in the use-case behind the expression: Why do you want to know CPU requests compared to limits on a pod level? What I usually see is people comparing
1. usage to requests on a workload level ("is my workload on average using what it is guaranteed and what it is blocking -> mostly about cost-efficiency by optimizing requests of workloads")
2. OR they compare usage to limits ("is my workload hitting the limits? -> hitting the limits results in CPU throttling in case of CPU and OOM kills in case of memory").

Best,

Florian

Brace yourselves - cloud-native deployments are coming.

PeterDrenth1 · ‎29 Jul 2024

When someone is running a multitenant environment, you usually put limits on how much CPU per namespace, not per workload. When customers need higher than alloted CPU or MEM, they can be given more on a per namespace/tenant basis, often ppl will request too much per pod. where they only use maybe 1% of what they really requested for the pod. In this case you want to be able to show them when they are simply requesting too much for other pods and that 's why they run into limits. Again .. this is not workload or node based limits. So I do understand the question of initial poster. ( Even though I also don't have the answer )

Kind rgds,

Peter

florian_g · ‎29 Jul 2024

thx - if this is about namespace quotas, you can get these insights quickly in our new Kubernetes app on the namespace screens. You can see the total of requested CPU/Mem vs. actual usage. Further below, we display how much CPU/Mem requests are being used out of the defined quota.

Brace yourselves - cloud-native deployments are coming.