cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Kubernetes service uptime

nsethurama
Frequent Guest

How could we get uptime for a k8s cluster (backend) services?

We are setup our SLO calculations based on the number of error free requests by total number of requests received!

However, when the service is completely down, we are worried it may affect the actual SLA but it will not change the calculation method as there won't be any change in terms of requests rate!

Hence, we would like to know the metric for k8s backend service availability or uptime monitoring!

Kinldy advise!

Thanks,

Nava

4 REPLIES 4

tomaxp
Mentor

Hi Nava,

You are right — if you calculate SLO only as error-free requests / total requests, then a complete outage with zero traffic won’t change the ratio, even though the service is down. To cover this gap you should add a time-based availability metric from Kubernetes.

A simple approach is to use the ratio of available replicas vs desired replicas for each deployment:

( k8s.deployment.available:splitBy() / k8s.deployment.desired:splitBy() ) * 100
  • When all desired replicas are running, the value = 100%.

  • If some pods are not available, the percentage drops.

  • If the whole deployment is down (available = 0, desired > 0), the metric = 0%.

You can use this ratio either in a dashboard tile or directly as the numerator/denominator in a metric-based SLO. That way you combine your request-based SLO with a replica-based uptime SLO, and you can report SLA only when both conditions are satisfied.

We are in Dynatrace managed and i am not able to see any metric related to 

 k8s.deployment.available

or

k8s.deployment.desired

 

Let me know, if i am missing something here!

Unfortunately, I don’t have a way to test this myself, but you can try the following approach.

Metric keys (Classic):

  • builtin:kubernetes.pods — count of pods (filter by phase “Running”)

  • builtin:kubernetes.workload.pods_desired — desired pods per workload

Metric selector (Advanced mode), grouped by cluster/namespace/workload:

(
  builtin:kubernetes.pods
    :filter(eq(k8s.pod.phase,"Running"))
    :splitBy("k8s.cluster.name","k8s.namespace.name","k8s.workload.name")
    :sum
)
/
(
  builtin:kubernetes.workload.pods_desired
    :splitBy("k8s.cluster.name","k8s.namespace.name","k8s.workload.name")
    :avg
)
*100

 

Many thanks.

Above approach helped!

Featured Posts