Solved: Number of Cluster nodes per DC for high availability

andre_vdveen · ‎27 Aug 2018

Hi, we have a client who runs out of two data centers and wants to know if a 4-node Managed Cluster will be a viable option, where two nodes reside in each data center (DC) and each node being relatively small. The DC's are connected via high-speed links and VPlex is used for their current HA.

From a redundancy perspective, will the Cluster survive an outage of 1 of the 2 DC and will the data remain accessible and intact if they run on only 2 of the 4 nodes, until such time the DC is restored? Or would it be better to run a bigger single node Cluster, relying on VPlex to handle the HA?

Any tips or pointers would be greatly appreciated!

Andre

Radoslaw_Szulgo · ‎28 Aug 2018

We don't currently support Multi-DC deployment with DC awareness of the Cluster node. It means that cluster won't survive DC outage as we will require to have (N+1) / 2 nodes running (in that case 3).

To sum up, splitting cluster nodes across 2 DC does not make you more reliable yet. Multi-DC support is planned for Q4/2018 - Q1/2019.

For now, I think you can utilize restore/backup as a way of a rescue solution.

Senior Product Manager,
Dynatrace Managed expert

dirk_loosen · ‎28 Aug 2018

@Andre van der V., Maybe the way I positioned the 2 data centres is not quite correct.

The bank actually has only 1 datacentre which is split into 2 areas in the same location. Under a building, wing1 is considered data centre 1 and wing 2 is datacentre 2. They are fed by separate power feeds and each have their own network switch architecture and vCentre environment. These are then linked over highspeed network. You can have a server in data centre 1 and a server in the same VLAN or same subnet in data centre 2 (so way different from 2 data centre providers in the cloud for example).

Based on this, having 4 nodes where two are in data centre 1 and two are in data centre 2 just means that if power is lost in 1 centre, then 2 nodes are down and 2 remain up.

In my mind this is the same than having 3 nodes in 1 DC and have 1 of the 3 nodes fail, you still are left with 2...

Am I missing something here?

Lastly, the further question was around adding VPlex on top of the Cassandra based 2+2 arrangement for HA. In that case, if 1 DC fails, another set of nodes fires up in the still working DC and you still have 4 nodes (just in 1 DC now). I just don't know if that will work well because the data could be slightly out based on a few seconds to spin up the new VMs. And if that messes with the overall integrity then I would still rather stay with the 2+2 scenario and if I lose a DC then I need to get it back up ASAP, but I remain up in the meantime.

Any additional explanation of what of the above may or may not work and why would be greatly appreciated.

Radoslaw_Szulgo · ‎05 Sep 2019

If 2/4 nodes fail at the same time, cluster is lost. Adding quickly 2 more nodes does not help here, as the data need to be synced and half of the data is already lost. So my answer above Is still valid.

Senior Product Manager,
Dynatrace Managed expert

dirk_loosen · ‎28 Aug 2018

@Radoslaw S.

Thank you for your informative response. Based on my above comments/scenario, would you respond differently or is the answer basically the same? I don't know much about data centres and hence I am asking if what you were saying about DC awareness still applies. Rgds, Dirk

Radoslaw_Szulgo · ‎05 Sep 2019

Answer is same. The reason is the replication factor and a requirement of majority quorum.

Senior Product Manager,
Dynatrace Managed expert

Radoslaw_Szulgo · ‎28 Aug 2018

Your details doesn't change much. No matter how we would distribute nodes across both data centers/locations/networks... we still need to keep majority up. This is required by the databases replications - Cassandra and Elasticsearch.

In case majority of nodes are down, you cannot quickly launch new nodes instead of missing ones, as in that moment data is inconsistent and you don't have any means to recreate them. The only viable option that moment is to recreate whole cluster from the backup.

Once we have datacenter awareness in place, data will be fully replicated across a datacenter instead of a cluster.

Senior Product Manager,
Dynatrace Managed expert

antonio_villarr · ‎05 Sep 2019

Hi,

At this moment, do you have information about the roadmap for the the datacenter awareness feature? Things like possible dates/quarter of the year and the minimum number of nodes per datacenter?

Thanks in advance.

Antonio V.

dirk_loosen · ‎28 Aug 2018

@Radoslaw S.

Thank you, based on that and our roll out and growth plan, I think I will go with single server and VPlex for HA as suggested by @Andre van der V.and then revisit the scaling and HA model across DCs when the DC awareness is in place. Thanks again. Rgds, Dirk

Ashok_Swaminath · ‎14 Mar 2019

Hi..Is the DC Awareness (Multi-DC support to get nodes across multiple DC) available?

jean_louis_lorm · ‎02 Apr 2019

Hello @Radoslaw S., when will the multi DC support be available ? Regards, JLL

Radoslaw_Szulgo · ‎03 Apr 2019

Active/passive multi dc might go out this year. Active/passive means that nodes in one dc will be running normally, and nodes in passive dc will be shut down “on stand-by”. In case of a disaster you will be able to run instantly nodes from the other dc.

Active/active multi dc - probably early 2020.

Senior Product Manager,
Dynatrace Managed expert

Dant3 · ‎03 Apr 2019

hi @Radoslaw S. Is there an ETA (aprox) for Active/Passive? Im planning of upgrading the infra of the Cluster.. Since we have 3 DC and we are currently with 2 nodes (1DC) we are planning to expand with 2 nodes per DC.

Services Solution Engineer @PowerCloud - Observability/CloudOps Certified / Former SE @Dynatrace.

jcurbina · ‎29 Apr 2019

Hi, we are also interested in hearing about Multi DC support.

Radoslaw_Szulgo · ‎05 Sep 2019

For all who are interested in HA support for Dynatrace Managed - here's a product idea to follow:
Dynatrace Managed support for high availability architecture

Senior Product Manager,
Dynatrace Managed expert