When a Dynatrace Managed server is restarted ungracefully, there is a high chance that Dynatrace Managed server fails to start up again due to corrupted Cassandra commit log. The quick fix is to delete the offending commit log before starting up Dynatrace Managed service again.
Could there be a way to make Cassandra more resilient to unexpected shut downs? Or automatically delete the offending commit log and move on?
Not that unexpected shut down happens frequently but when it happens, monitoring downtime is stays even longer until someone manually removes the corrupted commit log.
Solved! Go to Solution.
The machine must have been hard-stopped in the middle of Cassandra writing a commit log onto disk. That's a known issue that we'll fix in a future release. No ETA yet.
Yes, in Dynatrace Managed version 1.210 we've set the Cassandra JVM option "Dcassandra.commitlog.ignorereplayerrors=true"
This ignores the replay of any corrupted commit logs and will allow you to restart the node without having to identify each individual corrupt commit log and having to move out/delete if there are a large number of corrupt commit logs.