cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Clustering/Failover data recovery

cosmin_gherghel
Dynatrace Pro
Dynatrace Pro

Hi,

I am currently working towards setting up a new DCRUM environment at my client site. This new environment will feature clustering and failover. I will have 2 data centers with a Endace Probe in each and 2 CASes in each. Only 1 data center will be primary the other will be the failover. 

My questions are:

2) Do the failover CASes always process data just like the primary since they use their own database? How does the failover process work? Can we use one database for 2 CASes (primary and failover)?

 

1) In case of a DR situation where lets say CAS1 goes down in data center 1 and the failover CAS (CAS2) in the second data center comes up and takes over its responsibilities, and stays there for 1 week. After that 1 week we fail back, how does CAS1 sync its database? Does it go to AMD and try to pull back what it can or is there another way to sync its database with the failover database that has been running while primary CAS was down?

6 REPLIES 6

chris_v
Dynatrace Pro
Dynatrace Pro

2). Yes both primary and failover CAS process the same data at the same time. They need separate databases.

1). AMDs keep 10days of data on them, so as long as the primary CAS is back up within 10 days it'll collect the data it missed and process it to catch up.

 

cosmin_gherghel
Dynatrace Pro
Dynatrace Pro

Hi Chris,

One other question. Lets say that CAS1 is down for 15 days. When it comes back online and it and pulls the last 10 days of data does that mean that CAS1 will have a 5 day gap in data when it is back to being the primary node?

 

Yes, if the CAS is down longer than the AMD's retention (8 days by default, though configurable as long as AMD's storage space allows), the CAS will have a hole in the data matching the days the CAS was down that the AMD does not have data for.

 

-- Erik

Just to follow up further on this, what would be the process to retain full administrative abilities if the primary nodes datacenter went down? Would you break farm and make the failover nodes primary? Would swapping primary/failover work if the primary was down (to regain full admin abilities). There seems to be a gap with the primary cas's failover that does not have full functionality in regards to this situation if we were to retain the previously set farm configuration.

Swapping primary with its failover gives you the option of having full functionality. After the primary is back again and is with sync with the new primary then you can swap back.

Is it common practice for licensing to provide permanent licenses for the failover nodes (in the case they become primary, ie the inverse of our current licensing)? I was going to swap primary as I know this would provide the administrative abilities however the licensing would have been an issue and I know in the past there has been a limit on the emergency license amd wise (don't know if this has been removed/resolved).