Solved: Should one assign rack numbers to 3 and 6 node clusters?

Radoslaw_Szulgo · ‎06 Feb 2023

Given these scenarios:

Cluster A:

Rack1	Rack2	Rack3
Node 1	Node 2	Node 3

Cluster B:

Rack1	Rack2	Rack3
Node 1	Node 3	Node 5
Node 2	Node 4	Node 6

In Cluster A, only one node may be lost (one rack) and in Cluster B, two nodes may be lost (up to one rack).

Then, it is no different to a five or more node cluster, which may lose up to 2 nodes.

Therefore, is it correct in stating that benefits of rack-awareness are not realised until a cluster size of nine (9) nodes or above?

However, if later there is a plan to add rack-awareness, would it be best to assign three rack names to each of the nodes above? Thereby avoiding the need to migrate to rack-awareness.

Is there any downside, say, to assigning a rack each to two nodes per availability-zone in a 6 node cluster deployed on a public cloud? Does it cause any additional overhead? What happens when two nodes from different AZs (racks) are lost? Does it matter?

Additionally: is there any time limit for when a disconnect from a node or an availability zone may be allowable? Will those nodes be rejected by the cluster / not synced if re-connecting after some time?

Senior Product Manager,
Dynatrace Managed expert

Radoslaw_Szulgo · ‎06 Feb 2023

In the reference to: Fault domain awareness

Rack aware deployment ensures that no replica is stored redundantly inside a singular rack, so that replicas are spread around through racks. In case one rack goes down, the other two full replicas are available, ensuring data consistency and availability.

Regarding availability and resilience, rack-aware deployment of 6 nodes in 3 racks equals 5/6 nodes in a single rack. You're right that you get more benefits starting with 9 nodes deployed evenly in 3 racks.

We recommend starting with the rack-aware deployment if you can afford three physical locations. For example, availability zones in the AWS cloud. There's no side effect of this way, and as you highlight, you're already prepared for such a scenario to avoid additional migration.

What happens when two nodes from different AZs (racks) are lost?

Assuming there are in total 6 or more nodes, this is an unhealthy cluster state. See the example below:

Rack 1	Rack 2	Rack 3
[DOWN] Node 1 - 50% of replica 1	Node 2 - 50% of replica 2	Node 3 - 50% of replica 3
Node 4 - 50% of replica 1	[DOWN] Node 5 - 50% of replica 2	Node 6 - 50% of replica 3

It's pretty clear now, that you've lost half of replica 1 and half of replica 2. It means that there might be some data with just one replica of data in rack 3. Whereas data is not lost (as still 1 replica exists) it might be in the inconsistent state (no more majority). In such state, cluster should be able to write data, but there might be an issue retrieving the data that require higher level of consistency. A Cassandra repair should help.

Is there any time limit for when a disconnect from a node or an availability zone may be allowable? Will those nodes be rejected by the cluster / not synced if re-connecting after some time?

In such situation Dynatrace Managed cluster leverages Cassandra's Hinted handoff. It is a Cassandra feature that optimizes the cluster consistency process and anti-entropy when a replica-owning node is not available, due to network issues or other problems, to accept a replica from a successful write operation. The time duration hints are stored is set to 72 hours. After that time the oldest hints are overwritten by the new one. To learn more on how hinted handoff works - visit https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/dml/dml_about_hh_c.html#Howhintedhandoffwor...

Senior Product Manager,
Dynatrace Managed expert