11 Feb 2025 04:11 PM
Hi!
I have a values-type metric. I'd like to display a timeseries that only counts datapoints whose value is greater than (or less than) an arbitrary value. I don't want to filter an aggregation, but analyze each value and remove the ones I don't want.
How can I achieve this?
Thanks for your help!
13 Feb 2025 07:53 AM
I am not sure if I got the requirement correctly. Here are my thought:
To get timeseries (cpu usage in my example) where at least one value is greater the predefined threshold iAny function can help
timeseries cpu=avg(dt.host.cpu.usage), by: {dt.entity.host}
| filter iAny(cpu[]>80)
and to see in the final result only datapoint matching this conditions "iterative expression" is useful:
timeseries cpu=avg(dt.host.cpu.usage), by: {dt.entity.host}
| filter iAny(cpu[]>80)
| fieldsAdd cpu = if(cpu[]>80, cpu[])
17 Feb 2025 04:09 PM
Hi @krzysztof_hoja !
Thanks for your help. But you are filtering the result of avg(dt.host.cpu.usage). I would like to filter each data point independently before any aggregation. I would like to build response time SLO for example 🙂 I can do it easily with fetch logs but I can't find a solution with the timeseries command ...
18 Feb 2025 08:43 PM
You cannot find it, because it does not exists in generic case for timeseries 🙂
I used dt.host.cpu.usage with breakdown by host sort of on purpose. Let's consider this query:
timeseries { cpu=avg(dt.host.cpu.usage), cpu_t=sum(dt.host.cpu.usage, rollup:total) } , by: {dt.entity.ec2_instance, dt.entity.host}
| filter dt.entity.host == "HOST-937E3C790B64E8B5"
Besides plain average I added second timeseries: sum with rollup:total. This additional metric will tell us how many contributions aka raw measurement happened. Result looks like this:
The value of cpu_t is constantly 6 every minute because reading of cpu usage for host happens every 10 sec. But this individual measurement are not stored. What is stored and is in fact most granular "data point" is statistical description of what happened containing in basic case 4 values: min, max, sum and count (sum and count allows to calculate average). From this compounds bigger aggregates can be calculated like for host groups or for longer intervals.
If we take a look at similar query:
timeseries { rt=avg(dt.service.request.response_time), rt_t=sum(dt.service.request.response_time, rollup:total) } , by: {dt.entity.service}
, filter: dt.entity.service == "SERVICE-CB0AFF6C5BC4EABE"
and result
you can see that number of contributions is variable: these are actuals requests. But this metrics has also additional dimension which can allow to look at it at more granular way. Adding "endpoint.name" allows to look at this at more granular way:
You can even look deeper by splitting requests into successful and failed, but in generic case you will get to the point when you get datapoint representing single requests. You may just have it by chance when only one request fell in the specific bucket.
The basic idea of metric is to have aggregated view on a process (bucketized time, selected dimensions only) - you loos details but you gain easy and fast access. If details are needed: for some cases we have span to look deeper (service.request.response_time can be recreated from spans if no sampling occurs), but also for some we do not keep details.
20 Mar 2025 03:24 PM
Hi @krzysztof_hoja ! Thanks for explanation. Is there a roadmap to implement filtering on raw datapoints, even if it results in slower performance?