Solved: Cassandra service unable to start due to commit log corruption

mengsuan_koe1 · ‎17 Sep 2020

When a Dynatrace Managed server is restarted ungracefully, there is a high chance that Dynatrace Managed server fails to start up again due to corrupted Cassandra commit log. The quick fix is to delete the offending commit log before starting up Dynatrace Managed service again.

Could there be a way to make Cassandra more resilient to unexpected shut downs? Or automatically delete the offending commit log and move on?

Not that unexpected shut down happens frequently but when it happens, monitoring downtime is stays even longer until someone manually removes the corrupted commit log.

Radoslaw_Szulgo · ‎17 Sep 2020

The machine must have been hard-stopped in the middle of Cassandra writing a commit log onto disk. That's a known issue that we'll fix in a future release. No ETA yet.

Senior Product Manager,
Dynatrace Managed expert

tomasz_plonski2 · ‎18 Nov 2021

Radek, anything new for this issue? I've just found the same problem on one Managed environment.

Radoslaw_Szulgo · ‎18 Nov 2022

Yes, in Dynatrace Managed version 1.210 we've set the Cassandra JVM option "Dcassandra.commitlog.ignorereplayerrors=true"

This ignores the replay of any corrupted commit logs and will allow you to restart the node without having to identify each individual corrupt commit log and having to move out/delete if there are a large number of corrupt commit logs.

Senior Product Manager,
Dynatrace Managed expert