19 Feb 2020 05:48 PM
We are looking to add Cassandra nodetool repair scheduled tasks to our managed cluster and was wondering if anyone else was currently doing this. What schedule do you use for your cluster nodes? Does it run on each cluster node at the same time?
Solved! Go to Solution.
19 Feb 2020 06:31 PM
Colin my friend,
That blog post might be interesting for you:
https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
Repair in Apache Cassandra is a maintenance operation that restores data consistency throughout a cluster. It is advised to run repair operations at leasts every gc_grace_seconds
to ensure that tombstones will get replicated consistently to avoid zombie records if you perform DELETE statements on your tables.
That also is useful:
https://stackoverflow.com/questions/37921042/cassandra-nodetool-repair-best-practices
And DataStax recommendations :
https://docs.datastax.com/en/archived/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
19 Feb 2020 06:47 PM
Radoslaw, Hey Buddy! Is the default behavior of nodetool repair a full repair with the version of Cassandra in Dynatrace Managed? How would I check what the value of gc_grace_seconds is set to?
Colin
19 Feb 2020 07:14 PM
I believe we use default. The default value of gc_grace_seconds is 864000 seconds (10 days).
I’ll try to double check tomorrow.
The nodetool script provided in the installation is just a wrapper for Cassandra nodetool. By default if node is not specified it runs on all nodes.
24 Feb 2020 12:36 PM
For some tables it's 3d. Adding additional constraint that you have a large volume of data and the repair can take longer time - I'd say that it's be better to run repair only after 3d of node down or disconnected from the cluster.