07 Aug 2025 05:33 PM
Need DQL Query to get failure rate by service instance or process group instance
I tried below query:
timeseries { failedRequests = sum(dt.service.request.count, scalar: true, filter: { failed == true }), totalRequests = sum(dt.service.request.count, scalar: true) }
, by: { dt.entity.service, dt.entity.process_group }
, filter: { in(dt.entity.service, classicEntitySelector("type(service),entityName.equals(\"prod-service")")) }
| fieldsAdd rate = failedRequests / totalRequests
| fieldsAdd serviceName = entityName(dt.entity.service)
| fieldsRename dt.entity.process_group, ProcessGroup
| sort rate desc
| fieldsRemove totalRequests, failedRequests
| lookup [
fetch dt.entity.process_group_instance
| fields entity.name, instance_of[dt.entity.process_group]
| fieldsRename `instance_of[dt.entity.process_group]`, processGroupName
], sourceField:ProcessGroup, lookupField:processGroupName
Its returning result only failure rate but not by splitting it with process group instance
Refer screenshot for reference
Solved! Go to Solution.
07 Aug 2025 06:04 PM - edited 07 Aug 2025 06:11 PM
Add the summarize command to the end of your DQL and you may get what you need
|summarize avg(rate), by:{ProcessGroup}
edit: I re-read your post and undestood that you need to see the failure rate per instance, not per group...
In this case, the only option we have is thru MDA, choosing the metric as failure rate and setting the dimmension to service instance... but we can not create a metric with this settings, saddly, only save the MDA config for future and quick reference.
I will keep watching this thread, since I am also interested in this, just in case someone knows a better way to do it.
07 Aug 2025 06:31 PM
This is impossible with the built-in metrics (dt.service.*) , as those are calculated per service.
You can do that either by querying spans (be careful with query costs if it will be on a dashboard). You can use something like this:
fetch spans
| filter in(dt.entity.service, classicEntitySelector("type(service),entityName.equals(\"prod-service\")")) and request.is_root_span == true
| summarize { failed=countIf(request.is_failed == true), total=count() }, by:{dt.entity.process_group_instance}
| fieldsAdd failure_rate=toDouble(failed)/toDouble(total)
You can see another example here.
Or you can create a metric in the OpenPipeline for spans, calculate your custom metric there and then use the metric in the dashboard.
08 Aug 2025 05:22 PM
Thank you Julius_Loman