cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

AG machine itself goes down

deyisanupam
Visitor

I have only one AG to monitor Azure cloud services, What if the AG machine itself goes down, then how will OAs communicate to DT server on SaaS environment.
Please note: OAs are not open for https 443, they are open for only port 9999 to connect to AG.

12 REPLIES 12

victor_balbuena
Dynatrace Mentor
Dynatrace Mentor

The OAs will try to find another available AG to communicate to. If there is not, they wil ltry to communicate with the DT Server directly. Since that is also blocked, then you will lose monitoring data of your Azure cloud services, as the OAs will not be able to send the data to DT. This is why it's a good idea to set up more than one ActiveGate so that the OneAgents can switch to another one if the main one goes down.

So, can this below mechanism work?
In failover case, where Primary ActiveGate VM is unavailable, the secondary ActiveGate VM should be used for communication, but till then can I keep the secondary AG VM in stopped or off state to save the cost of that secondary AG VM and when Primary AG VM goes down secondary AG VM will start automatically?
Will there be any implication in connection or traffic?

It does not work like that unfortunately. Dynatrace will not start an AG by itself. So, if you want the above, you have to handle the workflow somehow where you monitor the state of the main AG and when it goes down, trigger a Jenkins pipeline or similar, that will start the second AG. If you do that, it could work, and the OA will change AG automatically whenever the second one comes online. Just make sure the AGs are in the same Network Zone as the OneAgents.

It is also a good practice to have both AGs running for load balancing from OA traffic.

Thank you for your response, Victor.
Kindly clarify below doubts as well for me.
1.) So, the fallback mode will not be used here, as both the AG will be in same network zones. right?
2.) secondly, failover may happen rarely and till then the secondary AG VM with activegate installed and configured will be stopped or off, will there be any issue for OAs to discover this secondary AG when it comes back online?
3.) Will it work for Azure services monitoring like PaaS services where one agent is installed as extension or through the Azure monitor integration with Dynatrace SaaS.

Happy to be of help.

  1. Exactly. You can also have the secondary AG on the fallback network zone, and that would also work, since if there's no available AGs on the main network zone, the OAs will look for an AG on the fallback network zone.
  2. Not really, there might be a small delay in terms of a couple of minutes, but the OA stores the monitored data for those shorts minutes and sends it all at the same time (with the correct timestamps) once it finds an AG.
  3. Yes, but the failover system in these cases works differently, they don't use network zones. Instead, for example, the azure integration uses any ActiveGate regardless of network zone or group, that has the Azure monitoring module enabled.

Thank you for answering my queries.
So, all I need to make sure to have 2 AG VMs as exact replica in same network zone, where primary AG VM will work as active and will remain ON. The secondary AG VM will work as passive and will remain off until a failover is triggered, OneAgents or Azure Integration will automatically connect to the secondary AG VM as soon as it becomes Online.
All I need to ensure that this trigger happens again to switch off the secondary AG VM, when Primary AG VM comes back online.
Questions:
1.) How does OAs or Azure Integration communicates or connects on 9999 with AG? Is it through IP address or hostname or tokens basis? There could be a possibility that IP address or hostname might change at the time of secondary AG VM provisioning during failover trigger. will it still discover or connect to the secondary AG VM?

2.) As both the AG VMs will be provisioned in Azure, other than using Jenkins, is there any possibility to achieve the above objective from Azure only? Do you have any idea on this?

So let me backtrack a bit here. After giving it more thoughts (and thanks to your question) I think your solution would not work. Since the OneAgent only has connectivity to one AG, it knows only the IP of that AG (it also knows the IP of the DT server, but it's not able to connect to it), which means that once the AG is down, the OA cannot communicate with anything else, so even if you spin up a secondary AG for the OA, and the connectivity is there, it does not know its IP since it cannot retrieve it from the DT Server or from anywhere else, so it will not connect to it anyway. The only solution for this would be to go inside the machine where the OA is running and configure it with the IP of the new ActiveGate via the --set-server parameter.

That's why it's a good idea to have both ActiveGates running, so you have actual failover and high availability. What I've seen in other customers is to have an auto-scaling group behind a load balancer, with the load balancer as the AG endpoint, which means that the OA only needs to know one IP, the one of the load balancer and it doesn't care how many instances are running behind it.

As per Azure integration, this is not an issue as it's the AG reaching out to Azure Monitor for data and not the other way around, so if the AG used for Azure integration is down, the DT Server is able to send the connection information to a second AG that you just spin up and have that second AG reach out to Azure monitor to continue monitoring.

Thank you Victor for the solution here.
To have an auto-scaling group behind a load balancer, with the load balancer as the AG endpoint, which means that the OA only needs to know one IP, the one of the load balancer and it doesn't care how many instances are running behind it.
May I know how to set up the load balancer IP as the AG endpoint for the OAs. kindly share the link if any documentation is available.
Now that I will go for the LB IP as the AG endpoint solution, will I need to make any changes for Azure Integration as well? or it can be done as mentioned in below link  https://www.dynatrace.com/support/help/shortlink/azure-monitoring-guide

 

Will this load balancer solution work on windows AG VM or only Linux AG VM?
link: https://www.dynatrace.com/support/help/shortlink/oneagent-reverse-proxy

 

deyisanupam
Visitor

If my primary activegate VM goes down in SaaS Dynatrace environment,  how can I achieve high availability, where oneagents host machines cannot connect to https 443?

Hi @deyisanupam , you can achieve HA using Network Zones: https://www.dynatrace.com/support/help/manage/network-zones

Even if no NZ set, by default, the OneAgents will try every AG present in your env, so the HA is already there by default, you just need to make sure the network flow between the OAs and AGs exists and are allowed.

Site Reliability Engineer @ Kyndryl

What if the AG machine itself goes down, then how will OAs communicate to DT server on SaaS environment. Please note: OAs are not open for https 443, they are open for only port 9999 to connect to AG.

Featured Posts