16 Apr 2026
04:50 PM
- last edited on
17 Apr 2026
07:35 AM
by
MaciejNeumann
Hello,
Is there a way where i can see service failure rate but have it split by host.
we have the same services on multi different host and want to see all services on all host and their failure rate?
Maybe a DQL query?
for example:
| Host | Service | Failure Rate |
| Server111 | Service111 | |
| Server111 | Service222 | |
| Server222 | Service111 | |
| Server222 | Service222 |
16 Apr 2026 05:36 PM
You can see that directly in the services app in Failures tab. Playground link .
You can adjust grouping as necessary. You can also grab SQL from there.
17 Apr 2026 09:09 AM
Ive had a look at this but when i go to find "Host" in Group by it doesnt appear.
17 Apr 2026 08:25 AM
Hi,
try this:
fetch spans, from:now()-2h, to:now()
| filter request.is_root_span == true
| filter isNotNull(dt.entity.host) and isNotNull(dt.entity.service)
| summarize
failed = countIf(request.is_failed == true),
total = count(),
by: { dt.entity.host, dt.entity.service }
| fieldsAdd
host = entityName(dt.entity.host),
service = entityName(dt.entity.service),
failure_rate = 100.0 * failed / if(total == 0, 1, else: total)
| fields host, service, failure_rate
| sort failure_rate descyou should recive this:
17 Apr 2026 09:20 AM
Hi,
Thank you for this. this has worked but Is there a way you can convert this to timeseries please? i have tried but i get no records coming.
Thanks
17 Apr 2026 10:04 AM
Yes, because your working query starts from fetch spans, you should use makeTimeseries, not timeseries.
timeseries is the starting command for metrics, while makeTimeseries is the command used after fetching records such as spans. Dynatrace also notes that for spans, makeTimeseries uses the start_time field automatically.
try this:
fetch spans, from:now()-2h, to:now()
| filter request.is_root_span == true
| filter isNotNull(dt.entity.host) and isNotNull(dt.entity.service)
| makeTimeseries {
failed = countIf(request.is_failed == true),
total = count()
},
by:{dt.entity.host, dt.entity.service},
interval: 5m
| fieldsAdd host = entityName(dt.entity.host),
service = entityName(dt.entity.service),
failure_rate = if(total[] == 0, 0.0, else: 100.0 * failed[] / total[])
| fields timeframe, interval, host, service, failure_rate
17 Apr 2026 10:41 AM - edited 17 Apr 2026 10:51 AM
Hi,
Thank you for this, this has worked and can put it as a graph. Is there a way we can help reduce the cost for this by doing it a different way as at the minute it is scanning over 100gb and will require it to run quite often?
I tried this query but it doesnt give me all services on all hosts like yours does.
timeseries {
operand1 = sum(dt.service.request.failure_count),
operand2 = sum(dt.service.request.count)
}, by: { dt.entity.service}, nonempty:true
| fieldsAdd failureRate_timeseries = operand1[] / operand2[] * 100
| fieldsRemove operand1, operand2 | fieldsAdd metricName = "Failure rate"
| fieldsadd service_name = entityName(dt.entity.service)
| lookup [fetch dt.entity.host
| fields hostid = id, hostname = entity.name, serviceid = runs[dt.entity.service]
| expand serviceid], lookupField:serviceid, sourceField:dt.entity.service
| fieldsRemove dt.entity.service, metricName, lookup.hostid, lookup.serviceid
17 Apr 2026 10:50 AM
Hmmmm,
IMO, the best way to reduce cost is to stop querying raw spans repeatedly and instead extract a metric from spans and query that metric afterward.
Right now this query works, but it scans a lot of trace data each time. Dynatrace’s own direction is that for this kind of use case, you should use OpenPipeline metric extraction from spans. That gives you a metric you can later query with cheap timeseries instead of repeatedly scanning large amounts of span data.
17 Apr 2026 10:53 AM - edited 17 Apr 2026 10:56 AM
Perfect, Thank you for your help on this. Much appreciated. Will you be able to show me how to do this via openpipeline please?
17 Apr 2026 12:02 PM
Thank you for this. appreciate it
Featured Posts