30 Aug 2024 01:30 PM
Log data is critical for various reasons including resolution of issues, compliance, and audit. Therefore, it is critical that all logs are captured and available for analysis. In a loss of connectivity scenario, how long is the log data cached to insure no loss of data during temporary outages? Are there features or configurations to adjust the maximum retention period/size to accommodate a longer disruption of service? If there are no current features or configurations, is there any roadmap to provide such features?
Solved! Go to Solution.
05 Sep 2024 12:41 PM
Hi Ron.
If logs are ingested via OneAgent and a loss of connectivity scenario happens, logs that are not sent to Dynatrace are stored in the local cache. When the OneAgent cache limit is reached, it stops reading data from the source, but as soon as connectivity is back to normal, it resumes reading from the place where the last log line was ingested. In that case, the rotation of logs can be problematic. If log records that have not been sent yet are removed or deflated before the connectivity is restored, OneAgent will not be able to ingest them. This should be addressed by log producer configuration.
The cache size is not configurable by the OneAgent component owner, but in case of specific requirements or network specifics, Dynatrace support can influence this setting on given tenant-based use case details.
We constantly improve log ingest characteristics. We plan to introduce a way to ingest compressed files via OneAgent and invest further in elastic OneAgent retransmission handling to upgrade resiliency to changing connectivity characteristics.
The cache mechanism is also in use if logs are integrated via the log ingestion API exposed on Environment ActiveGate. If network issues prevent log forwarding to Dynatrace and the cache is full, the API returns error 503 Service Unavailable: Usable space limit reached so that the API client will retransmit log batches that were rejected during an outage.
Disk queue size is configurable. Please refer to the documentation: