01 Aug 2025
09:52 PM
- last edited on
07 Aug 2025
07:16 AM
by
MaciejNeumann
We've been exploring Dynatrace's dashboarding for the last couple of months and I continue to run into the same problem over and over and over again.
In Grafana we are able to put in a metric, `x_timestamps` or `x_status_codes`, and it displays that metric unaltered in a way we can then process them as needed. Timestamps we can be translated from epoch, status codes can be mapped to strings on our graphs for quick glances / easy alert setup. For example, when an alert goes off we link to a dashboard that shows the current running process count and the latest status code of all running processes in one location.
I have been unable to figure out how to show this raw data in any form in Dynatrace. We don't want to aggregate timestamps. We don't want the max status code. We want the last reported raw metric data, we want to display unaltered timeseries data to confirm all processes are running when they should, and ending when they should. We need raw timeseries data to show the timeline of given status codes, to report the actual number of successful jobs, to report an accurate number of currently connected hosts, not the average, or the max, or the min. Is this not possible within DQL?
We have a custom prometheus exporter running logic against our services and spitting out these custom metrics, as this is the only way we are able to get this data.
For example, this query functions but jumbles our dates (avg/max/min/count/sum):
timeseries last_run_date_gauge = max(last_run_date_gauge ), by: { tenant }
| fieldsAdd arr = `last_run_date_gauge`
| fieldsAdd val = timestampFromUnixMillis(arrayLast(arr))
but this query doesn't work at all, as an aggregation is required:
timeseries last_run_date_gauge = last_run_date_gauge , by: { tenant }
| fieldsAdd arr = `last_run_date_gauge`
| fieldsAdd val = timestampFromUnixMillis(arrayLast(arr))
Ideally it would be as simple as `timestampFromUnixMillis(Last(last_run_date_gauge)), by: { tenant }` ?
Am I missing something simple where we can display this data in a way that it doesn't lose it's meaning?
Solved! Go to Solution.
01 Aug 2025 11:07 PM
Dynatrace metrics do not store the raw data. Dynatrace metrics are stored with 1-minute resolution, including aggregates - avg/min/max/count/sum. So if you sent 10 datapoints in a minute, Dynatrace will calculate aggregates accordingly. This will help you to display various aggregations.
So, actually the approach of using max - if you need the latest epoch timestamp and you are sending it as a gauge should work. Can you share a detailed example including values?
I'm not familiar with your data source, but it seems like bizevents or logs could be a better fit conceptually than a gauge representing epoch timestamp.
04 Aug 2025 06:26 PM - edited 04 Aug 2025 06:34 PM
Most of our processes for this particular product are on legacy infrastructure, which is mostly made up of MSSQL stored procedures. We have a custom prometheus exporter that we have created to be able to poll the DBs and transform that data into status codes, runtimes, and active session data without bogging the DB memory or disk space. That data is then being sent out as prometheus metrics that we are able to graph/display/store in Grafana as is:
That Prometheus exporter exists on an AKS cluster with other supporting services and is running in a docker container. We are able to import these metrics via the Dyantrace Operator. But when trying to display them in a similar way in Dynatrace, we've been running into several issues, namely the one stated above: Data is being changed.
These are the dates we are seeing with the max query:
timeseries date_gauge = max(date_gauge ), by: { tenant }
| fieldsAdd arr = `date_gauge`
| fieldsAdd val = timestampFromUnixMillis(arrayLast(arr))
Found something interesting when looking at the 'raw' metric.
This is Grafana:
{environment="prod",instance="<Rd>",job="<Rd>",tenant="<Rd> PROD"} 1754265600000
{environment="prod",instance="<Rd>",job="<Rd>",tenant="<Rd> PROD"} 1754006400000
{environment="prod",instance="<Rd>",job="<Rd>",tenant="<Rd> PROD"} 1751241600000
{environment="prod",instance="<Rd>",job="<Rd>",tenant="<Rd> PROD"} 1754092800000
{environment="prod",instance="<Rd>",job="<Rd>",tenant="<Rd> PROD"} 1754092800000
04 Aug 2025 07:34 PM - edited 04 Aug 2025 08:10 PM
It looks like your query does not match your data dimensions. You are selecting the maximum by tenant , while in Grafana you have also environment, instance and job too. In this case, Dynatrace returns maximum of the value from the interval for each tenant.
Can you share a piece of the raw data? (Prometheus exporter sample from your environment)
05 Aug 2025 05:22 PM
I can't share the raw prometheus data, but adding a *1000 and also filtering by env (the job and instance are the same for all metrics) seems to have fixed that particular graph. However we are still getting aggregated data on our status codes. And if Dynatrace aggregates data on injest then we will simply have to live with what we've done below with 'close enough' data and go back to manual checks for our on calls:
This is the best we were able to do (example)
timeseries status = avg(status), by: { tenant }, filter: { environment == "prod" }
| fieldsAdd arr = `status`
| fieldsAdd val = arrayLast(arr)
| fieldsAdd Status = if(val <= 1, "In progress", else: if(val <= 2, "Succeeded", else: if(val <= 3, "Cancelled", else: if(val <= 4, "Waiting", else: "Failed"))))
Meaning we'll simply have to extend this if statement for each status code we have, since the data is aggregated we can't just say "val == status code" but have to assume it's larger than the status code before it. Which isn't all that great. For example if we have a run that goes from "In Progress"/1 to "Cancelled"/3 it will report that it was "Successful"/2 as that is the average, which means we can't rely on our graphs like we used to.
06 Aug 2025 08:06 AM
Dynatrace does some aggregation on the input, but only if there are multiple values in a minute. Since you are using Prometheus exporters, those are scraped every minute and aggregation does not happen.
In your example, you have explicitly selected the avg aggregation. This means Dynatrace will calculate the average value from the bins in the timeseries. You can select either the number of bins or intervals for the bucket. So I'd say you need to add an additional parameter interval:{1m} in your timeseries command. Since you have not specified interval or bins parameters, Dynatrace will automatically determine those based on the timeframe you have selected. If your timeframe is too large, Dynatrace might automatically adjust the interval parameter so that the number of bins returned is below 1500.
See more here.
06 Aug 2025 05:24 PM
Our prometheus exporter has some metrics that report every 15 seconds, as well as some that report every 30 seconds for a select few that we need quick response times on. I think there are 3 or 4 total that report more than once per minute. So this data will be aggregated, correct?
06 Aug 2025 07:38 PM
@j03 Prometheus metrics are scraped. There are two common options to get metrics data from Prometheus exporters. I. Kubernetes, metrics are scraped by the ActiveGate every minute (you do that by annotations). Otherwise you need to create your own Extension 2.0 and use Prometheus as the data source. There you can define your custom interval, but afaik you can have only one minute interval or longer.
Nevertheless unless you used some other option, your data won't be aggregated on the input as they are collected every minute.