cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to Aggregate (Sum) Success vs Failure Counts Totals per Time Interval in DQL

Raul_rrr
Frequent Guest

We are trying to build a metric to monitor the success vs failure rate of an API over time.

Requirement:

  • Treat all status codes between 400–599 as Failure
  • Treat all other status codes as Success
  • Display this in a bar/line chart with:
    • X-Axis: Timeframe
    • Y-Axis: Count of Success and Failure
  • Created a Custom Metric which Captures Endpoint Name and Client ID used for Dashboard Filtering.

Issue:
Using the below DQL, we are able to generate the data. However, the output splits into multiple Success and Failure counts. For example, within a 5-minute interval, each unique combination of endpoint, client, service, and status is treated as a separate record.

Our expectation is to see only two records per timeframe (Success and Failure totals), but instead we are getting multiple entries.

We attempted to use dedup status to consolidate the records, but this results in loss of data and incorrect counts.

Looking for an alternate approach that allows us to aggregate Success and Failure counts properly and visualize them as a bar/line graph.

"If we try to summarize success and failure, then Line Graph goes unsupported."

timeseries {
  value = sum(service.muleapi.response.count)
},
by: {dt.entity.service, http.response.status_code},
filter: {
  in(endpoint.name, array($Endpoint)) AND in(`request_attribute.Client-Id`, array($Client_ID))
}
| lookup [
    data record(service_id = "SERVICE-EXXXFB", api_name = "ABC"),
    record(service_id = "SERVICE-9XXXX448", api_name = "DEF"),
    record(service_id = "SERVICE-FXXXX926", api_name ="XYZ")
  ], sourceField: dt.entity.service, lookupField: service_id
| fieldsAdd API = lookup.api_name
| fieldsRemove lookup.service_id, lookup.api_name
| filter in(API, array($API_Name))
| fieldsAdd status = if(condition: http.response.status_code >= 400 AND http.response.status_code < 600,
   then: "Failure", else: "Success" )
| sort status desc
//| dedup status

Screenshots: Line Graph with geometry as bar graph

Without dedup (with Legend) – multiple Success/Failure bars per timeframe

Raul_rrr_0-1759204168782.png

Without dedup (no Legend) – Graph looks fine visually, but bars still represent multiple records instead of aggregated totals

Raul_rrr_3-1759204294119.png

With dedup – Inaccurate data as multiple records are dropped

Raul_rrr_2-1759204257435.png
Please suggest an alternate approach (without dedup) to  aggregate Success vs Failure counts per timeframe, so that we only see two entries (Success, Failure) per interval?

5 REPLIES 5

dannemca
DynaMight Guru
DynaMight Guru

Try to add a |maketimeseries sum(value), by:{status}

https://docs.dynatrace.com/docs/discover-dynatrace/platform/grail/dynatrace-query-language/commands/... 

Site Reliability Engineer @ Kyndryl

Hi @dannemca ,

Thanks for the suggestion! I tried adding  | maketimeseries sum(value), by:{status}, but it’s still not working as expected. I’m getting the following error:
"Please specify the parameter `time` explicitly, as the implicit default timestamp doesn't exist."

timeseries {
  value = sum(service.muleapi.response.count)
},
by: {dt.entity.service, http.response.status_code},
filter: {  dt.entity.service == "SERVICE-A123" OR dt.entity.service == "SERVICE-B123" OR 
dt.entity.service == "SERVICE-C123"}
| fieldsAdd status = if(condition: http.response.status_code >= 400 AND http.response.status_code < 600,
   then: "Failure", else: "Success" )
| maketimeseries sum(value), by:{status}

Tried adding a time column and feeding it to makeTimeseries, but it fails: | fieldsAdd time = bin(interval, 1m) then | makeTimeseries sum(value), by:{status}, time: time

Execution returns an INCOMPATIBLE_DATA_TYPES

Any suggestions or example snippets that extract/bucket a timestamp from an interval/timeframe and work with makeTimeseries would be greatly appreciated.

Replace the "| fieldsAdd time = bin(interval, 1m) " by "| fieldsadd time = timeframe[start]" and try again.

It should works now.

Site Reliability Engineer @ Kyndryl

you can aggregate timeseries using this syntax:

| summarize value=sum(value[]), by: {dt.entity.service, status, interval, timeframe}

full query based on your exampe above:

timeseries {
  value = sum(service.muleapi.response.count)
},
by: {dt.entity.service, http.response.status_code},
filter: {  dt.entity.service == "SERVICE-A123" OR dt.entity.service == "SERVICE-B123" OR 
dt.entity.service == "SERVICE-C123"}
| fieldsAdd status = if(condition: http.response.status_code >= 400 AND http.response.status_code < 600,
   then: "Failure", else: "Success" )

| summarize value=sum(value[]), by: {dt.entity.service, status, interval, timeframe}

 

of couse if you de not need to have breakdown by service, just ommit it in by: clause

| summarize value=sum(value[]), by: {status, interval, timeframe}

 

@krzysztof_hoja  - | summarize value=sum(value[]), by: {status, interval, timeframe} , worked perfectly. Thank you. Much appreciated!!

@dannemca - Thank you as well for the makeTimeseries suggestion, it was very helpful.

Featured Posts