Automations
All questions related to Workflow Automation, AutomationEngine, and EdgeConnect, as well as integrations with various tools.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to correctly model a traffic-based SLO when total request count can be zero?

deni
Pro

Hi,

I’m trying to define a combined, traffic-based availability SLO in Dynatrace, built from multiple calculated service metrics (for example, total requests vs. failed requests across several endpoints).

The basic success formula is the usual one:

success = 100 * (total_requests - failed_requests) / total_requests 

This works correctly as long as there is traffic.

However, I’m struggling with the edge case where total_requests = 0, which can legitimately happen (for example during maintenance windows, upgrades, or when the service is intentionally unavailable and receives no traffic).

Observations so far

  • default(0) works as expected for error metrics (failed request count).

  • Applying default() or value to total request metrics either:

    • is rejected by the metric selector (for example ...total...Requests:count:default(0)), or

    • produces syntactically valid expressions but leads to unexpected SLO values and burn rates (for example ...total...Requests:value:default(0)).

    From a pure math perspective, this does not necessarily imply a division-by-zero problem — the numerator would also be 0, and the denominator can even be guarded with a non-zero default (for example default(1)) to keep the formula mathematically valid.

  • When total requests are zero, the SLO evaluation either becomes N/A or produces non-intuitive results if workarounds are attempted (for example very low percentages like 4%).

Example expression

This expression works correctly as long as total requests are greater than zero:

100 *
(
  (
    (
      calc:service.totalService1Requests
      + calc:service.service2Requests
    )
    -
    (
      calc:service.errorService1Requests:count:default(0)
      + calc:service.errorService2Requests:count:default(0)
    )
  )
  /
  (
    calc:service.totalService1Requests
    + calc:service.totalService2Requests
  )
)

Questions

  1. Is it by design that traffic-based SLOs in Dynatrace cannot meaningfully evaluate to 100% when total_requests = 0?

  2. Is there any recommended pattern to handle this case within a single combined SLO (for example, treating “no traffic” as healthy)?

I’d like to avoid workarounds that distort the SLO math or burn-rate calculation, and instead understand the officially supported behavior and best practices.

Regards, Deni

Dynatrace Integration Engineer at CodeAttest
3 REPLIES 3

Julius_Loman
DynaMight Legend
DynaMight Legend
@deni  can you be more specific with the errors and results you have? (screenshots please).
I believe that this is caused by the way how (classic) SLOs are calculated - the SLO uses inf resolution. This essentially breaks metric expressions as there are single values used instead of timeseries as input for the expression. In other words - a division of averages is not the same as the average of divisions.

You can see that, for example, in the playground, I took a "zero" only metric - as an example
But your case can be different. Also worth mentioning:
  • You miss splitby in the metric selectors in the expression. this can produce unexpected results. use splitby, ideally removing all dimensions if it suits your case - e.g. :splitby()
  • There is a success rate metric builtin:service.successes.server.rate and also success count metrics
  • Generally, if there is no value, you can't measure. If there is zero traffic, you can't say your success rate is 100% - as there was no success.
Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Hi @Julius_Loman ,

Thank you for the references — I’ll review them.

Below is some additional context together with screenshots from my test environment.
The setup is intentionally simple: I’m using a small demo application that I wrote only to reproduce a real customer scenario.

I’ve defined calculated service metrics for:

  • total request count

  • failed request count

This setup exists for two endpoints:

  • login

  • register

In total, there are four metrics (2× total, 2× failed), which are combined into a single, traffic-based SLO.

deni_0-1767869666956.png

There is no business use case behind this setup — the goal is only to reproduce and validate the customer’s real production behavior with minimal endpoints.

To generate traffic, I run a script that sends requests for a short period of time.
Once the script finishes, traffic stops completely.

On the customer side, this situation can legitimately happen as well (for example during maintenance windows or upgrades).

Currently:

  • There are no failed requests (error generation is not yet part of the test).

  • Example data looks like:

    • 20 total - 0 failed / 20 total

    • later: 0 total - 0 failed / 0 total when traffic stops

The SLO expression compiles successfully

deni_1-1767870246525.png

but when I press “Evaluate SLO”, the evaluation shows this - I don't understand where these % comes from.

deni_2-1767870611106.png

If I try to apply a default() to the total request metrics, for example:

deni_3-1767870710939.png

or 

deni_4-1767870764476.png

I don't know why the error is different even that the code is the same - sometimes I see the first error and sometimes the second one.

 I can write it like this:

deni_5-1767870864555.png

but when I press “Evaluate SLO”, the evaluation shows this - Again, I don't understand where these % comes from.

deni_6-1767870908852.png

 

Regards, Deni

Dynatrace Integration Engineer at CodeAttest

@deni as I wrote before - add :splitBy() and :auto  tranformations. Also add the :fold(avg) (see my example above).
Always model this in Data Explorer first (easier) and use single value (don't change the fold aggreation in the chart options).

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Featured Posts