17 Sep 2020 04:05 AM - last edited on 19 Jun 2023 09:56 AM by Karolina_Linda
When a Dynatrace Managed server is restarted ungracefully, there is a high chance that Dynatrace Managed server fails to start up again due to corrupted Cassandra commit log. The quick fix is to delete the offending commit log before starting up Dynatrace Managed service again.
Could there be a way to make Cassandra more resilient to unexpected shut downs? Or automatically delete the offending commit log and move on?
Not that unexpected shut down happens frequently but when it happens, monitoring downtime is stays even longer until someone manually removes the corrupted commit log.
Solved! Go to Solution.
17 Sep 2020 11:54 AM
The machine must have been hard-stopped in the middle of Cassandra writing a commit log onto disk. That's a known issue that we'll fix in a future release. No ETA yet.
18 Nov 2021 02:06 PM
Radek, anything new for this issue? I've just found the same problem on one Managed environment.
18 Nov 2022 07:15 AM
Yes, in Dynatrace Managed version 1.210 we've set the Cassandra JVM option "Dcassandra.commitlog.ignorereplayerrors=true"
This ignores the replay of any corrupted commit logs and will allow you to restart the node without having to identify each individual corrupt commit log and having to move out/delete if there are a large number of corrupt commit logs.