Solved: Re: How to identify the list of servers exhibiting a specific high-volume logging behaviour?

Aaron_L · ‎31 Dec 2024

We are seeing that on a subset of our servers, a specific process is generating a very large number of log records containing a specific keyword "bypass".

How would I produce a list/chart of only the servers that have reported an average of more than 1000 lines/hour containing this keyword, and the total # of log lines with the keyword, over the past 5 days?

Thanks,

Aaron

PedroSantos · ‎31 Dec 2024

Hey @Aaron_L ,

A quick way that might help you achieve your objective is setting up a notebook and running this DQL query:

fetch logs
| filter matchesPhrase(content, "bypass")
| summarize count(), by:{Host = dt.entity.host}
| sort `count()` desc

This will fetch all logs, find the ones with "bypass" in content, and display the number of logs per host sorted by desc.

It will provide a table similar to this:

(this one is from the demo live environment)

Note that this does not provide an average of lines/hr, but the total count of lines with "bypass" per host within your selected timeframe. Naturally, the Hosts with more lines per hour will still very likely rank higher on the list of total count. So this can still be used to identify the list of servers exhibiting this specific high-volume logging behavior.

And keep in mind that, on Notebooks, you can always make use of the timeframe selector to adjust to your needs and specific hours:

(I used last 2 hours)

Lastly, if you're planning on having a dashboard with this query it might be worth to look into bucket assignment and optimize the query to look into specific buckets.

Let me know if this helps you 🙂

To make an error is human. To spread the error across all servers in an automated way is DevOps.

Aaron_L · ‎02 Jan 2025

Hi @PedroSantos ,

Thanks for your suggestion. I've updated the query a bit, as I want to calculate the percentage of log lines containing the specific keyword that come from one executable. Unfortunately, the new column I'm adding with fieldsAdd does not appear to function as I expect - I always get a 0 in the column.
Here is my current query:

fetch logs
| filter contains(content, "bypass")
| summarize TotalLogLines = count(), MyAppLogLines = countif(dt.process.name == "MyApp.exe"), by:{host.name}
| fieldsAdd MyAppPercent = round((MyAppLogLines/TotalLogLines)*100, decimals:2)

Any thoughts?

Thanks,
Aaron

PedroSantos · ‎03 Jan 2025

Hey @Aaron_L ,
One thing to keep in mind is that in Dynatrace Query Language the division between two integer values is an integer, which gets rid of any decimal portion.
Easy straightforward solution to this is just to cast both variabels to double:

fetch logs
| filter contains(content, "bypass")
| summarize TotalLogLines = count(), MyAppLogLines = countif(dt.process.name == "MyApp.exe"), by:{host.name}
| fieldsAdd MyAppPercent = round((toDouble(MyAppLogLines)/toDouble(TotalLogLines))*100, decimals:2)

A different less spaghetti code solution is also to just multiply one of the variables by 1.0, which makes the result not an integer.

fetch logs
| filter contains(content, "bypass")
| summarize TotalLogLines = count(), MyAppLogLines = countif(dt.process.name == "MyApp.exe"), by:{host.name}
| fieldsAdd MyAppPercent = round(( (1.0*MyAppLogLines)/TotalLogLines)*100, decimals:2)

See if either works for you 🙂

To make an error is human. To spread the error across all servers in an automated way is DevOps.

Aaron_L · ‎03 Jan 2025

Hi @PedroSantos ,

Thanks for the quick response. The toDouble() option worked well, and like the *1.0 option, it's only required on one of the variables to convert the result to floating point. Problem solved!

As I get a bit more experience with DQL, I may come back to the original problem of calculating average lines/hr, but this will do for now.

Aaron