cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Looking to upgrade from Dynatrace Managed to SaaS? See how

Node synchronization in a cluster

moffatrt
Contributor

We have two Dynatrace Managed nodes in a cluster. However, they currently appear to be out of sync due to a node being down for a few days. Is there a way to force a resync and/or verify that sync is working?

8 REPLIES 8

Radoslaw_Szulgo
Inactive

Sure. If you mean data synchronization in Cassandra, then you can check the status by executing

/opt/dynatrace-managed/utils/cassandra-nodetool.sh

More details:

https://docs.datastax.com/en/cassandra/2.1/cassand...

Senior Product Manager,
Dynatrace Managed expert

Hi Radoslaw,

Thank you very much for this info!

Does this get detected and rectified automatically or should this be a manual action that gets started.

Cheers!

What gets detected ? if data synchronization is in processes?

Cassandra nodes synchronize automatically, unless there is an issue e.g. connectivity.

Senior Product Manager,
Dynatrace Managed expert

If nodes are out of sync. And does the cluster automatically recover?

I was wondering the same thing as the size of the nodes still do not match. One is at 31.96GB and the other is at 43.96GB. Should these numbers always match?

thomas_steinma1
Dynatrace Advocate
Dynatrace Advocate

Cassandra has something called "hinted handoff". With that, the Cassandra node serving a write request will temporarily store a missed write for a down node for a time-frame of 3 hours. If a node is down longer than 3 hours, it will get practically out-of-sync once it recovers (Cassandra process started up again) and needs a "repair" from a Cassandra low-level perspective.

This can be invoked by a Cassandra command-line tool called "nodetool" and proper options. The above mentioned shell script is just our "wrapper" script around nodetool. In this particular scenario: "a node recovers from down-time > 3 hours still being part of the same cluster as before", the correct nodetool execution on the recovering node is to invoke a full repair via:

/opt/dynatrace-managed/utils/cassandra-nodetool.sh repair

Best being executed in a dedicated Linux screen session, as this may take hours depending on the data volume.

Regarding size being reported as "Load" via nodetool. They don't need to necessarily match. Details on that would be beyond this comment here. For an active repair, there is usually sign in the Cassandra log (cassandra.log) and/or even "nodetool compactionstats" is reporting compactions of type "Validate" on the recovering node.

moffatrt
Contributor

In addition, if one of the nodes is offline longer than 7 days, the other nodes remove any reference to it and remove it from the cluster. If the other node is reactivated, it will never sync up with the cluster as it is now effectively orphaned. This is basically what happened to our node and why it never synced up. Thanks for the info, everyone!

kristof_renders
Dynatrace Champion
Dynatrace Champion

Thanks, all, for your great and detailed answer. So if I can summarise:

  • Nodes out of sync (as described by Thomas) are not automatically repaired and a manual command has to be initiated
  • If your node is offline for too long (7d+) it will be orphaned and no longer part of the cluster. I guess best solution is to add a new node to the cluster? Would we still be able to access PP data from that node?
  • Still an open question is if Dynatrace actively reports on nodes being out of sync? Will we see it in the CMC? Or even in the debug UI?

Again, thanks all for the great answers!

Cheers,
Kristof

Featured Posts