17 Oct 2022
10:59 AM
- last edited on
23 May 2023
02:43 PM
by
Michal_Gebacki
Hello,
how is the average aggregation when designing a Dynatrace Dashboard actually computed?
Take for example the failure rate and apply an average aggregation in the Data Explorer.
builtin:service.errors.total.rate:splitBy():avg:auto:sort(value(avg,descending)):limit(100)
For me, it seems that the failure rate is determined for all services and then the failure rate is weighted by the number of requests that were received by the respective service.
So is the Average actually a weighted average?
When trying to reproduce the calculation of a small set of Webservices, it seems that the dashboard / data explorer result computes the weighted average, instead of just the (unweighted) mean.
I could not find any information in the documentation, unfortunately.
Thanks for your help!
Best regards, Bernhard
Solved! Go to Solution.
17 Oct 2022 12:16 PM
Hi @bernhard_s,
Here is the relevant documnetation.
Metrics API - Metric selector | Dynatrace Docs
I hope it helps.
Best regards,
Mizső
14 Dec 2022 05:48 PM - edited 14 Dec 2022 05:48 PM
Yes that’s right. If you go metricname:avg. (so before the split by) then you should get the mean of means (per time slot) instead. @fcourbon fyi. Pls correct me if that’s wrong.
15 Dec 2022 12:03 PM - edited 15 Dec 2022 12:16 PM
Hi @zietho .
Thanks for your response. I am still a bit confused about your response.
To be honest, I was not getting the relevance of your example to obtain the mean of means completely.
I tried different queries, where i selected some metric, like `builtin:service.response.time` and `builtin:service.requestCount.total` to get for every service the response time and request count.
Then, I compared my computation in excel with the result dynatrace comes up with when adding various aggregations. I also tried what @zietho suggested for "get[ting] the mean of means".
That is I started with a query with:
builtin:service.response.time:filter(
and(
in("dt.entity.service",entitySelector("type(service),serviceType(~"WEB_SERVICE~")")),
in("dt.entity.service",entitySelector("type(service),tag(~"[Environment]app.tenant:xx~")")),
in("dt.entity.service",entitySelector("type(service),tag(~"[Kubernetes]app:yy-services~")"))
))
and a 2nd one where I use `builtin:service.requestCount.total` instead which also uses the same filter condition.
The resulting table has 3 columns, one for the service name, one for the response time and one for the request count.
I exported the table the DataExplorer to CSV for importing and checking the aggregation results.
In the follow-up changes I made to the response time query, I have disabled/removed the query for the request count, i.e. I only used the query for the response time metric.
Interestingly, and kind of strange is the Dynatrace query language because depending on the chaining of the same `avg` operators, you get a different result and semantics.
For your information:
I got these results by trying to attach these 3 different combinations separately and comparing them to the calculations with the table of response times and number of requests that I imported into Excel.
Concluding, the use of `avg` does not always mean take the (weighted/unweighted) average. 😖
Unfortunately, this is not reflected in the documentation -- yet.
Why is this important? Depending on your services' request count, there may be HUGE differences when comparing the mean and the weighted mean, if the request count distribution across services is skewed.
It would be nice if these findings would be somehow reflected in the Dynatrace documentation.
@zietho What is your opinion on that issue? Thank you for pointing me to try `:avg:splitBy():avg`.
Best regards,
Bernhard
21 Dec 2022 01:44 PM
Total agreement, and that's why I already mentioned this to @fcourbon (one of our metrics PMs) responsible for this part of the documentation.