10 Jan 2025 10:14 PM
We need to create a log metric to track the number of errors per minute, and then we can create a metric event that triggers if the number of errors is > some value for more than X minutes in the last X minutes. This should then create a problem card if there are errors and then we can create alerts off of that problem card.
Query:
fetch logs //, scanLimitGBytes: 500, samplingRatio: 1000
| filter matchesValue(dt.kubernetes.cluster.name, "dynakube-prd-centralus-plum-app-k8s") or matchesValue(dt.kubernetes.cluster.name, "dynakube-prd-eastus2-plum-app-k8s")
| filter matchesValue(k8s.deployment.name, "claims-ai-svc-claims-ai-helm-*")
| filter k8s.container.name == "claims-ai-helm"
| sort timestamp desc
| parse content, "JSON:json"
| filter json[level] == "error"
| fields timestamp ,json[level],json[data][msg],json[context], json[aiClaimId],content,json[hostname]
I am creating the new log processing rule but I am unable to set the correct processor definition in addition to the Matcher as my query above has several filters. Can someone assist me with corrections? Following documentation here:
Create anomaly detection metric — Dynatrace Docs
Currently i'm using the last matchesValue:
matchesValue(k8s.deployment.name, "claims-ai-svc-claims-ai-helm-*")
Solved! Go to Solution.
11 Jan 2025 01:09 AM
Hi @rpeng ,
As far as I understood your query, you should use summarize to get the count of errors and then split using dimensions.
Below is a sample:
fetch logs
| filter loglevel == "ERROR"
| summarize count(), by: {dt.entity.host,log.source}
Later, while setting the alert definition you can set the 1 minute interval to check the count of errors.
https://docs.dynatrace.com/docs/shortlink/lma-e2e-create-anomaly-detection-metric#create-alert