Solved: Convert Log query to log metric event for alerting

rpeng · ‎10 Jan 2025

Hello,

We need to create a log metric to track the number of errors per minute, and then we can create a metric event that triggers if the number of errors is > some value for more than X minutes in the last X minutes. This should then create a problem card if there are errors and then we can create alerts off of that problem card.

Query:

fetch logs //, scanLimitGBytes: 500, samplingRatio: 1000
| filter matchesValue(dt.kubernetes.cluster.name, "dynakube-prd-centralus-plum-app-k8s") or matchesValue(dt.kubernetes.cluster.name, "dynakube-prd-eastus2-plum-app-k8s")
| filter matchesValue(k8s.deployment.name, "claims-ai-svc-claims-ai-helm-*")
| filter k8s.container.name == "claims-ai-helm"
| sort timestamp desc
| parse content, "JSON:json"
| filter json[level] == "error"
| fields timestamp ,json[level],json[data][msg],json[context], json[aiClaimId],content,json[hostname]

I am creating the new log processing rule but I am unable to set the correct processor definition in addition to the Matcher as my query above has several filters. Can someone assist me with corrections? Following documentation here:

Create anomaly detection metric — Dynatrace Docs

Currently i'm using the last matchesValue:

matchesValue(k8s.deployment.name, "claims-ai-svc-claims-ai-helm-*")

RohitBisht · ‎11 Jan 2025

Hi @rpeng ,
As far as I understood your query, you should use summarize to get the count of errors and then split using dimensions.
Below is a sample:

fetch logs
| filter loglevel == "ERROR"
| summarize count(), by: {dt.entity.host,log.source}

Later, while setting the alert definition you can set the 1 minute interval to check the count of errors.
https://docs.dynatrace.com/docs/shortlink/lma-e2e-create-anomaly-detection-metric#create-alert

RB