27 Sep 2023 05:24 PM - last edited on 28 Sep 2023 09:08 AM by MaciejNeumann
Hello.
I get `Ingested log data is trimmed` message in Dynatrace Managed CMC events. So I go to the related FAQ : which says :
6. Inspect 1-minute intervals of log events ingest.
- If you see that log events are trimmed to the Maximum ingest of Log Events limit set for this environment, you need to increase it.
- If log ingest was below the limit in subsequent intervals, your log entries will be re-ingested and should be available later, but you could consider increasing the limit to avoid a delay in data processing.
But how do I do that ? How do I know in which case I am ? In my Environment Log Viewer, am I supposed to find or not find something related to "dt.ingest.warnings" ? Or is it supposed to be investigated through "Format Table" field named "trimmed" ? In which case what am I supposed to find or not find ?
Regards.
Solved! Go to Solution.
27 Sep 2023 09:49 PM - edited 27 Sep 2023 09:54 PM
You can check and allocate the maximum ingest of log events per minute in CMC at environment setting.
By click on the refresh cluster limit you will have the CLUSTER level overall limit. It can be splitted by the environments. in this example the cluster limit is ~ 168k (based on the memory and cpu capacity). Env1 has 140k / minute, Env2 has 20k Env3 has 8k...
With these two metrics you can monitor and alert the incoming or rejected logs:
dsfm:server.log_and_events_monitoring.events_incoming_count:splitBy():sum:sort(value(sum,descending))
dsfm:server.log_and_events_monitoring.events_rejected_count:splitBy():sum:sort(value(sum,descending))
I hope it helps.
Best regards,
Mizső
28 Sep 2023 08:27 AM
Thanks. I new about that already. Any thing about the actual question ?
28 Sep 2023 09:27 AM
You should probably consider increasing the maximum number of log events per minute regardless, as per Mizső's message, but to answer your original question:
The message you see about logs being trimmed doesn't always mean that your logs are actually being trimmed. It is a message generated by the DAVIS AI which could be over-reacting by seeing a spike in log ingest and thinking that logs will need to be trimmed, even if there is no data loss. So basically, we need to understand if this is the case, or data is being trimmed for real.
The way to do this is to check, as per the first case, if your log events are being trimmed to the maximum set for the environment. If this is the case, you need to increase the maximum to avoid data loss. If logs are not really reaching the maximum set for the environment (or maybe only once or twice in a row), then the case is that DAVIS AI is calculating, because of a spike, that there will be too many logs, even if this is not true. In this last case, there will be a delay in log ingestion (hence why increasing the maximum is still advised), but no data will be lost.
Hopefully that helps you understand which one is your case.
28 Sep 2023 09:53 AM - edited 28 Sep 2023 09:53 AM
Hello.
Interesting point about Davis. Thanks.
After much chating, here is my understanding. Say trimmed warning in CMC has a timestamp equal to 10:23:12. In the log viewer, set time frame exactly to one hour long from 09:23 to 10:23. Then, watch the graph (i.e. the plot). Don't search or try to filter for a special kind of log line, i.e. log events, especially attribute "dt.ingest.warnings" or table format field named "trimmed" are not to be used, this is not where to look at or what to search for. On the graph, check how high the bars reach. Each bar shows on a one minute interval the number of log events ingested. This is what could be compared to the "Maximum number of log events per minute" set in the CMC. If bars consistently (many minutes in a row) reach approximately this Maximum (not higher, not lower) it may show there could be a problem indeed. If bars are higher than this Maximum it shows (as stated by @victor_balbuena) that all events got eventually ingested, but with a delay. If bars are lower : no problem.
Let me know if this is an acceptable statement (otherwise I'll correct it as to not induce confusion in readers. 🙂 )
I feel the FAQ doc coud be amended with maybe decorated screen shots, stating explicitly where to look at exactly, what to expect, what not to expect, what could be considered as a confirmation of the problem, or a confirmation there is no problem.
Regards.