DQL
Questions about Dynatrace Query Language
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Service Failure rate by host

jmistry
Observer

Hello,

Is there a way where i can see service failure rate but have it split by host. 
we have the same services on multi different host and want to see all services on all host and their failure rate?

Maybe a DQL query?

for example:

Host Service Failure Rate
Server111 Service111  
Server111 Service222  
Server222 Service111  
Server222 Service222  
10 REPLIES 10

Julius_Loman
DynaMight Legend
DynaMight Legend

You can see that directly in the services app in Failures tab.  Playground link .

You can adjust grouping as necessary. You can also grab SQL from there.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

Ive had a look at this but when i go to find "Host" in Group by it doesnt appear. 

t_pawlak
Leader

Hi,
try this:

fetch spans, from:now()-2h, to:now()
| filter request.is_root_span == true
| filter isNotNull(dt.entity.host) and isNotNull(dt.entity.service)
| summarize 
    failed = countIf(request.is_failed == true),
    total = count(),
  by: { dt.entity.host, dt.entity.service }
| fieldsAdd 
    host = entityName(dt.entity.host),
    service = entityName(dt.entity.service),
    failure_rate = 100.0 * failed / if(total == 0, 1, else: total)
| fields host, service, failure_rate
| sort failure_rate desc

you should recive this:

host,service,failurate.jpg

Hi, 

Thank you for this. this has worked but Is there a way you can convert this to timeseries please? i have tried but i get no records coming.

Thanks

Yes, because your working query starts from fetch spans, you should use makeTimeseries, not timeseries.
timeseries is the starting command for metrics, while makeTimeseries is the command used after fetching records such as spans. Dynatrace also notes that for spans, makeTimeseries uses the start_time field automatically.
try this:

fetch spans, from:now()-2h, to:now()
| filter request.is_root_span == true
| filter isNotNull(dt.entity.host) and isNotNull(dt.entity.service)
| makeTimeseries {
    failed = countIf(request.is_failed == true),
    total = count()
  },
  by:{dt.entity.host, dt.entity.service},
  interval: 5m
| fieldsAdd host = entityName(dt.entity.host),
            service = entityName(dt.entity.service),
            failure_rate = if(total[] == 0, 0.0, else: 100.0 * failed[] / total[])
| fields timeframe, interval, host, service, failure_rate

Hi, 

Thank you for this, this has worked and can put it as a graph. Is there a way we can help reduce the cost for this by doing it a different way as at the minute it is scanning over 100gb and will require it to run quite often?

I tried this query but it doesnt give me all services on all hosts like yours does. 

timeseries {
operand1 = sum(dt.service.request.failure_count),
operand2 = sum(dt.service.request.count)
}, by: { dt.entity.service}, nonempty:true
| fieldsAdd failureRate_timeseries = operand1[] / operand2[] * 100
| fieldsRemove operand1, operand2 | fieldsAdd metricName = "Failure rate"
| fieldsadd service_name = entityName(dt.entity.service)
| lookup [fetch dt.entity.host
| fields hostid = id, hostname = entity.name, serviceid = runs[dt.entity.service]
| expand serviceid], lookupField:serviceid, sourceField:dt.entity.service
| fieldsRemove dt.entity.service, metricName, lookup.hostid, lookup.serviceid

Hmmmm,
IMO, the best way to reduce cost is to stop querying raw spans repeatedly and instead extract a metric from spans and query that metric afterward.
Right now this query works, but it scans a lot of trace data each time. Dynatrace’s own direction is that for this kind of use case, you should use OpenPipeline metric extraction from spans. That gives you a metric you can later query with cheap timeseries instead of repeatedly scanning large amounts of span data.

Perfect, Thank you for your help on this. Much appreciated.  Will you be able to show me how to do this via openpipeline please?

Thank you for this. appreciate it 

Featured Posts