We are looking to add Cassandra nodetool repair scheduled tasks to our managed cluster and was wondering if anyone else was currently doing this. What schedule do you use for your cluster nodes? Does it run on each cluster node at the same time?
Solved! Go to Solution.
Colin my friend,
That blog post might be interesting for you:
Repair in Apache Cassandra is a maintenance operation that restores data consistency throughout a cluster. It is advised to run repair operations at leasts every
gc_grace_seconds to ensure that tombstones will get replicated consistently to avoid zombie records if you perform DELETE statements on your tables.
That also is useful:
And DataStax recommendations :
I believe we use default. The default value of gc_grace_seconds is 864000 seconds (10 days).
I’ll try to double check tomorrow.
The nodetool script provided in the installation is just a wrapper for Cassandra nodetool. By default if node is not specified it runs on all nodes.
For some tables it's 3d. Adding additional constraint that you have a large volume of data and the repair can take longer time - I'd say that it's be better to run repair only after 3d of node down or disconnected from the cluster.