27 Oct 2025
03:32 PM
- last edited on
06 Nov 2025
01:58 PM
by
GosiaMurawska
I'm trying to alert on unhealthy argoCD applications in Dynatrace. My method for doing this is setting up a label on the argoCD application to be something like dynatrace_alert: true, then having an alert in Dynatrace that will alert when said application is not healthy. I'm doing it this way so that I don't need to constantly update the alert in Dynatrace. I just add a label and it also is easier to see what apps I'm alerting on by just looking for this label in my repo.
How I come up with this alert is by taking the result of 1 query to get a list of apps that should be alerted on (dynatrace_alert: true) then feeding that list into another query that will then show me which apps from that list is unhealthy. The problem I am running int is that it seems to work fine in a notebook but when putting into Davis Anomaly detector, I am getting 'Query does not result in a valid timeseries: No valid time series records found'. Any suggestions here?
Result in notebook
What I am using in the Anomaly detector
timeseries {alert_count = avg(argocd_app_labels, default: 0)}, by:{name}, filter:label_dynatrace_alert == "true", interval:1m
| join
[ timeseries {unhealthy_count = avg(argocd_app_info, default: 0)}, by:{health_status, name, k8s.cluster.name}, filter:health_status != "Healthy", interval:1m
| filter k8s.cluster.name != "eks-enterprise-dev"], on:{name}
28 Oct 2025 10:45 PM - edited 28 Oct 2025 10:45 PM
Hey!
fetch logs
| filter label_dynatrace_alert == "true"
| summarize unhealthy_count = avg(argcod_app_info), by:{name, k8s.cluster.name, health_status}
| filter health_status != "Healthy"
| makeTimeseries unhealthy_count = avg(unhealthy_count), by:{name}, interval:1m29 Oct 2025 12:13 PM
Thanks for the input here @juan_mesa. What I am alerting off of are Prometheus metrics, nothing with logs and using logs to get this information even if it was possible, I rather not do since I would then have to customize the logic even more instead of using native provided Prometheus metrics.
29 Oct 2025 01:05 PM - edited 29 Oct 2025 01:06 PM
Got it, that make sense if you're working with Prometheus metrics. I believe the issue comes from the join that its producing empty intervals. I´ll keep digging if there's a work around.
29 Oct 2025 02:24 PM
Thank you for your time here. I've tried a handful of things but can't get anything to work, but then again i'm nowhere near real proficient with dql when it comes to some of these more what I will say complex for myself queries