Solved: When Do Failed Pods go away?

nutsy4sure · ‎04 Oct 2023

classic dashboard, simple - builtin:kubernetes.pods:last:filter(eq("pod_phase", "Failed")):splitBy("k8s.namespace.name","dt.entity.kubernetes_cluster"):sum:sort(value(sum,descending))

it shows failed pods, in descending order, works great! It says last 5 minutes, but those pods never seem to go away from that query. Should they?

Result

Timeframe: 2023-10-04 16:44 - 16:49Auto (1m)4 record(s)

the number of records just keeps going up even though pods are long ago rebuilt.

thanks

Mizső · ‎05 Oct 2023

Hi @nutsy4sure

I guess it depends on. At one of my clients I see a similar situation:

My mteric experssion is:

builtin:kubernetes.pods

:filter(and(eq(pod_phase,Failed)))

:splitBy("dt.entity.cloud_application")

:sort(value(auto,descending))

:limit(10)

Dashboard filtered for the last 5 minutes (it has not been changed since 9 days becasue nobody cares about the failed pods, they are still there). 😉

I hope it helps.

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional

nutsy4sure · ‎05 Oct 2023

Thank you for the reply, but it sounds like you are just confirming my experience. The pods failed and vmware reacted the way it should and spun up a new one. At that point, I dont care about the failed pod. Different circumstances, I might not want it to disappear for investigation purposes, but I want this dashboard to be "real time", and days old failures that vmware long ago recovered from, isnt reflecting the current status.

Mizső · ‎05 Oct 2023

Hi @nutsy4sure,

It is not VMware, it is kubernetes. And based on the metric expression this is the actual status for the last five minutes: 1 runnning pod and 4 failed pod.

You should find another solution. Maybe you could count one of the kubernetes event which refers to the failed pod (with eg. pod name dimension). Then it can be vizualized well. eg.

I hope it helps.

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional