DQL
Questions about Dynatrace Query Language
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Filtering high-cardinality timeseries data by value

Rudolph_Sedlin
Participant

Hey all.

I have very-high-cardinality timeseries data that I send over to Dynatrace. To prevent the dashboard tiles from crashing and to allow alerting workflows to complete in a timely manner, I use iAny filter statements like "timeseries TPS=avg(TPS_BY_DIMENSION), interval:1m, by: dimension | filter iAny(TPS[] > 10)" to only account for larger and more relevant metrics without filtering directly for specific dimension values. The issue is that iAny is rather slow as it has to scan each and every timeseries by value rather than just once by dimension key like contains() for instance would do, and allows timeseries with just one high value to be admitted into my results. I wonder if other DQL exists that better addresses my need, though I doubt it given all timeseries values would realistically have to be scanned.

4 REPLIES 4

krzysztof_hoja
Dynatrace Champion
Dynatrace Champion

This query will act in similar way:

timeseries { TPS=avg(TPS_BY_DIMENSION), TPS_MAX=max(TPS_BY_DEMNSION, scalar: true) }, interval:1m, by: {dimension} 
| filter TPS_MAX > 10

 

Can you explain what do you mean exactly "very-high-cardinality timeseries?. Dynatrace metrics are not designed to deal with high cardinality dimensions (dimension tuples), which may result in incomplete queries.

Pretty much exactly that, while I have some metrics that are low-cardinality, others have hundreds of thousands of metrics each minute due to a high granularity (dimension tuples with many fields) and high throughput, which certainly stresses the Dynatrace metrics ingest and Davis.

t_pawlak
Leader

Hi,

In practice the most reliable and typically faster alternative to iAny() is to compute a per-series “gate” as a scalar derived from the already computed timeseries array. iAny(TPS[] > 10) is expensive because it scans the value array for every series and it also admits series with a single spike. Instead I do:

timeseries TPS = sum(dt.service.request.count, rate: 1m),
  from: -2h, interval: 1m, by: {endpoint.name}
| fieldsAdd TPS_MAX = arrayMax(TPS)
| filter TPS_MAX > 10
| sort TPS_MAX desc
| limit 20

scalar1.jpg

This filters on one number per series (max over the full window), without iAny().

I also tested what krzysztof suggest. The scalar:true variant

timeseries {
  TPS     = sum(dt.service.request.count, rate: 1m),
  TPS_MAX = max(dt.service.request.count, rate: 1m, scalar: true)
}, from: -2h, interval: 1m, by: {endpoint.name}
| filter TPS_MAX > 10

scalar_empty.jpg

…but in my environment it returned 0 records, while arrayMax(TPS) worked correctly. It seems scalar:true doesn’t always bind to the by:{...} series as expected (TPS_MAX ends up NULL per series), so filtering removes everything.

The reason is different.

iAny as well as arrayMax work on the already calculated results. What I proposed work on detailed data picking max from detailed timeseries. 
It will work identical for timeseries when all available (meaningful)  dimensions are listed in by:{}. Also when timeseries is aggregated with avg(), the result may not be very different. But when you use sum() and have many dimensions like for dt.service.request.count, the results will be way very different.

Featured Posts