Hi, we have a client who runs out of two data centers and wants to know if a 4-node Managed Cluster will be a viable option, where two nodes reside in each data center (DC) and each node being relatively small. The DC's are connected via high-speed links and VPlex is used for their current HA.
From a redundancy perspective, will the Cluster survive an outage of 1 of the 2 DC and will the data remain accessible and intact if they run on only 2 of the 4 nodes, until such time the DC is restored? Or would it be better to run a bigger single node Cluster, relying on VPlex to handle the HA?
Any tips or pointers would be greatly appreciated!
Solved! Go to Solution.
We don't currently support Multi-DC deployment with DC awareness of the Cluster node. It means that cluster won't survive DC outage as we will require to have (N+1) / 2 nodes running (in that case 3).
To sum up, splitting cluster nodes across 2 DC does not make you more reliable yet. Multi-DC support is planned for Q4/2018 - Q1/2019.
For now, I think you can utilize restore/backup as a way of a rescue solution.
@Andre van der V., Maybe the way I positioned the 2 data centres is not quite correct.
The bank actually has only 1 datacentre which is split into 2 areas in the same location. Under a building, wing1 is considered data centre 1 and wing 2 is datacentre 2. They are fed by separate power feeds and each have their own network switch architecture and vCentre environment. These are then linked over highspeed network. You can have a server in data centre 1 and a server in the same VLAN or same subnet in data centre 2 (so way different from 2 data centre providers in the cloud for example).
Based on this, having 4 nodes where two are in data centre 1 and two are in data centre 2 just means that if power is lost in 1 centre, then 2 nodes are down and 2 remain up.
In my mind this is the same than having 3 nodes in 1 DC and have 1 of the 3 nodes fail, you still are left with 2...
Am I missing something here?
Lastly, the further question was around adding VPlex on top of the Cassandra based 2+2 arrangement for HA. In that case, if 1 DC fails, another set of nodes fires up in the still working DC and you still have 4 nodes (just in 1 DC now). I just don't know if that will work well because the data could be slightly out based on a few seconds to spin up the new VMs. And if that messes with the overall integrity then I would still rather stay with the 2+2 scenario and if I lose a DC then I need to get it back up ASAP, but I remain up in the meantime.
Any additional explanation of what of the above may or may not work and why would be greatly appreciated.
Your details doesn't change much. No matter how we would distribute nodes across both data centers/locations/networks... we still need to keep majority up. This is required by the databases replications - Cassandra and Elasticsearch.
In case majority of nodes are down, you cannot quickly launch new nodes instead of missing ones, as in that moment data is inconsistent and you don't have any means to recreate them. The only viable option that moment is to recreate whole cluster from the backup.
Once we have datacenter awareness in place, data will be fully replicated across a datacenter instead of a cluster.
Active/passive multi dc might go out this year. Active/passive means that nodes in one dc will be running normally, and nodes in passive dc will be shut down “on stand-by”. In case of a disaster you will be able to run instantly nodes from the other dc.
Active/active multi dc - probably early 2020.
For all who are interested in HA support for Dynatrace Managed - here's a product idea to follow: https://answers.dynatrace.com/spaces/483/dynatrace-product-ideas/idea/200534/rfe-dynatrace-managed-s...