Solved: Dashboard and Data Explorer - average computation

bernhard_s · ‎17 Oct 2022

Hello,

how is the average aggregation when designing a Dynatrace Dashboard actually computed?

Take for example the failure rate and apply an average aggregation in the Data Explorer.

builtin:service.errors.total.rate:splitBy():avg:auto:sort(value(avg,descending)):limit(100)

For me, it seems that the failure rate is determined for all services and then the failure rate is weighted by the number of requests that were received by the respective service.

So is the Average actually a weighted average?

When trying to reproduce the calculation of a small set of Webservices, it seems that the dashboard / data explorer result computes the weighted average, instead of just the (unweighted) mean.

I could not find any information in the documentation, unfortunately.

Thanks for your help!

Best regards, Bernhard

Mizső · ‎17 Oct 2022

Hi @bernhard_s,

Here is the relevant documnetation.

Metrics API - Metric selector | Dynatrace Docs

I hope it helps.

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional

zietho · ‎14 Dec 2022

Yes that’s right. If you go metricname:avg. (so before the split by) then you should get the mean of means (per time slot) instead. @fcourbon fyi. Pls correct me if that’s wrong.

bernhard_s · ‎15 Dec 2022

Hi @zietho .

Thanks for your response. I am still a bit confused about your response.
To be honest, I was not getting the relevance of your example to obtain the mean of means completely.

I tried different queries, where i selected some metric, like `builtin:service.response.time` and `builtin:service.requestCount.total` to get for every service the response time and request count.
Then, I compared my computation in excel with the result dynatrace comes up with when adding various aggregations. I also tried what @zietho suggested for "get[ting] the mean of means".

That is I started with a query with:

builtin:service.response.time:filter(
  and(
    in("dt.entity.service",entitySelector("type(service),serviceType(~"WEB_SERVICE~")")),
    in("dt.entity.service",entitySelector("type(service),tag(~"[Environment]app.tenant:xx~")")),
    in("dt.entity.service",entitySelector("type(service),tag(~"[Kubernetes]app:yy-services~")"))
))

and a 2nd one where I use `builtin:service.requestCount.total` instead which also uses the same filter condition.
The resulting table has 3 columns, one for the service name, one for the response time and one for the request count.
I exported the table the DataExplorer to CSV for importing and checking the aggregation results.
In the follow-up changes I made to the response time query, I have disabled/removed the query for the request count, i.e. I only used the query for the response time metric.

Interestingly, and kind of strange is the Dynatrace query language because depending on the chaining of the same `avg` operators, you get a different result and semantics.

For your information:

If you append to the above 1st query for the response time `:splitBy():avg` you will get the mean weighted by the request count. (as pointed out in my original post)
But, if you instead append to the 1st query `:avg:splitBy()` only you get some sort of sum over the metrics. In my example you just get a single scalar value that is the sum of the response time. Strange behavior in my opinion, since neither `avg` nor `splitBy` does imply the meaning of a sum. 🤔
However, when I tried to append only `:avg:splitBy():avg` to the query (as @zietho suggested), I ended up with the (unweighted) mean of response times. That is the sum of response times across web services (i.e. result of point 2) divided by the number of web services. Again, there is room for improvement with regard to query language syntax. 🤔 But, it seems this does not yield the mean of means in the selected time period.

I got these results by trying to attach these 3 different combinations separately and comparing them to the calculations with the table of response times and number of requests that I imported into Excel.

Concluding, the use of `avg` does not always mean take the (weighted/unweighted) average. 😖
Unfortunately, this is not reflected in the documentation -- yet.

Why is this important? Depending on your services' request count, there may be HUGE differences when comparing the mean and the weighted mean, if the request count distribution across services is skewed.

It would be nice if these findings would be somehow reflected in the Dynatrace documentation.
@zietho What is your opinion on that issue? Thank you for pointing me to try `:avg:splitBy():avg`.

Best regards,
Bernhard

zietho · ‎21 Dec 2022

Total agreement, and that's why I already mentioned this to @fcourbon (one of our metrics PMs) responsible for this part of the documentation.