Re: Pro Tip: Keep Your Log Management Budget Under Control

Maheedhar_T · ‎04 May 2025

Log Optimization

Log ingestion is one of the most expensive things when coming to Dynatrace licensing. That doesn’t mean we stop using it right.!! We need to worry about the expenses if we use it right. Let’s focus on breaking it down and how efficiently we can manage it. There are three parts of costing.
They are
1. Ingest Costs
2. Retain Costs
3. Query Costs

1. Ingest Costs:

OneAgent with its log auto-discovery files will already detect the log files and ingests based on the log ingest rules. So how do we save the costs here do we not ingest? No.! certainly not. Log ingestion is something that we can not miss on.! Imagine having an application which loads business transaction statuses in log files or having a web application which stores the user’s login info, web page errors and that kind of stuff or even critical situation where we have a multi layered infrastructure where we need to find where the transaction is struck in between applications based on the tracking header. We need to thus ingest the logs but rather we’ll do in a controlled manner. Think of it like going to a restaurant and ordering only what we can eat rather than wasting the available budget.

So how do we efficiently use the log ingest rules?

Firstly decide what we need to ingest by narrowing down to the log files that are needed.

Suppose you want to monitor only the log files from /var/log folder you just go to log ingest rules and add a filter this way.

but then /var/log is a very generic folder, and it has a ton of logs.! And if we just give this as a matcher it will ingest all kinds of logs “INFO”, “WARN”, “ERROR” and it could contain something very generic. Also, if you have line or para separators as whitespaces that causes even more trouble. As a developer, I want my log file to be understandable, so I’ll not be planning on not using padding in my logging. I’ll just be having some n padding characters after each set of log is written but this padding causes log lines with n number of whitespaces or null characters which there’s no point ingesting in but Dynatrace doesn’t count it that way. (Or maybe not yet). So whenever you ingest you need to combine it with content matchers too. Let’s say we added padding and we do not want that to be included while ingesting logs. We can just simply add one more rule like this

You can add all log levels except for NONE as that would be the log level of padding lines.
If you’re more specific on your logs you can add the content matchers as well like if the log line contains “ERROR” ingest that line or if the log line contains “SESSION” ingest that log line based on the log content in matcher attribute. This way we can save a ton on the log ingest.

2. Retain Costs:
Now that we’ve already ingested the logs, next part would be on the log retain. Though retain doesn’t cost that much compared to ingest, we can still save a bit here too and Open-Pipeline provides a brilliant path for the same.!
Now, after the log comes into Dynatrace, we either use that to count the number of records or to save the log records. If we are saving the log records to visit back to them in case of critical issues or some wild deployment failures, we need to decide on how long the log is needed. Would it be enough to store the logs for 7 days or would it need as long as 6 months.?

Secondly, if we’re going for counting the number of occurrences of certain log records then it would be better if we can convert that as a metric and drop the log record itself. Metrics by default will have 462 days of retention by default which is included in the metric ingest license so this would address the retention period, since your log is now a metric and you can set up alerting , dashboards everything from the metric itself you can just drop the logs from Dynatrace which means you pay 0 for the log retention.!

(NOTE: This can be applied to log attributes as well. If you define your processing rules well, you can extract the numeric attributes from your log records then you can drop them as well. Not speaking about those in detail as this article is more focused on cost saving)

Let’s see on a high level how we can achieve this. Consider I have a use case where I want to see how many times “SESSION EXPIRED” has come up in my logs.
We go to Open-pipeline and in the Metric extraction part of the Open-pipeline we create a counter metric this way

Now all the occurrences of log records that have SESSION EXPIRED would be counted with this metric. Since we need only number of sessions expired we can drop all the log records which have SESSION EXPIRED in the content. For that, we go to Storage and select no storage assignment which drops the logs as soon as they are converted to metric.

(NOTE: All the processor rules, storage assignment rules come in the top-bottom hierarchy so if there’s another rule that says isNotNull(content) and sends to a storage bucket above this drop rule it will not drop the logs. Make sure you use it efficiently).
3. Query Costs:
The next part after ingesting and storing the logs is visualising them.
There might be dashboards where you want to visualise your data, there might be queries that you want to run and check the status of your application but to make it all efficient you need to use log buckets in each of your query.
Let’s say I have an application called easy_travel and I want to check the logs for the same. I know that it writes logs to a certain file called error.log my query would look like

fetch logs
| filter matchesPhrase(log.source, “error.log”)

Is this efficient enough? If I run this query, it would return a ton of content as many other applications in the same tenant might have that same file.! The results would be mixed up with some of your application logs some of other application’s logs and such. You can use even better filter saying

fetch logs

| filter matchesPhrase(log.source, “error.log) and matchesPhrase(dt.host_group.id, “easy_travel”)

This would definitely narrow down the results to my specific application but still it would scan all the records across the tenant. Rather if I create a bucket for my application and dump all my application’s log into that and then add that in my query, the number of scanned records would only include my application’s logs.! Which is a brilliant way to save on expenses..!!

One more noteworthy thing is even if you have bucket’s added for each of your application and you’re not using it in your query, Dynatrace would by default scan all the buckets alphabetically.! Which means if my bucket name is easy_travel it would come below the buckets that are named app_easy_travel, central_bucket and so on.. so Dynatrace by default scans the records in app_easy_travel then in central_bucket and so on until it comes to your log bucket easy_travel which might hit the limits of 500GiB of query or 1MB of log records or 1000 number of log records which will not serve the purpose.

Here's another Pro Tip from my previous post on how we can save on querying data in Dynatrace dashboard.
Pro Tip: Optimise your dashboards by querying less and Visualising More - Dynatrace Community

P.S: Dynatrace is powerful, but it takes some study and patience—because with great power comes great responsibility . 🕷

(Let me know if anything needs to be added or corrected. Also, do let me know if you've learnt something new from this. Happy Monitoring)

Regards,
Maheedhar Talluri ( @Maheedhar_T )

Maheedhar

theharithsa · ‎04 May 2025

This is a very interesting topic for customers who are looking to migrate their platform to Grail - The most powerful offering of Dynatrace yet.

You explained how an organization can utilize Dynatrace's powerful Log Analytics and Management engine efficiently. It is great work, @Maheedhar_T. Thanks for sharing. It helps a lot.

Love more, hate less; Technology for all, together we grow.