When a Dynatrace managed server is restarted ungracefully, there is a high chance that Dynatrace managed server fails to start up again due to corrupted Cassandra commit log. The quick fix is to delete the offending commit log before starting up Dynatrace Managed service again.
Could there be a way to make Cassandra more resilient to unexpected shut downs? Or automatically delete the offending commit log and move on?
Not that unexpected shut down happens frequently but when it happens, monitoring downtime is stays even longer until someone manually removes the corrupted commit log.
Solved! Go to Solution.
The machine must have been hard-stopped in the middle of Cassandra writing a commit log onto disk. That's a known issue that we'll fix in a future release. No ETA yet.