cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

OneAgent failed to connect to Managed cluster after the update to version 1.141.217.20180503-140959 on May 07

CharlesWu
Organizer

This happened right after the agent update, the following error showed in the installer log indicating timed out when OneAgent trying to connect to all of cluster members:

DynatraceOneAgent failed to connect to Dynatrace
Cluster Node https://.........

I ran a "wget https://Managed-cluster:8443/communication" on the Agent node, I see "connected" with no cert matched error which is expected. I also ran "netstat -an | grep 8443" on both the agent node and cluster node, and I see "established" connections. I ran tcpdump on the cluster, and I can see they are communicating in both ways. I restarted both OneAgent and Cluster, but still not working. Not sure what else we can check to make OneAgent work again to connect to Managed cluster without a timed out?

5 REPLIES 5

Julius_Loman
Leader

Did you try to reinstall the agent? (after uninstalling it)

CharlesWu
Organizer

I did not try to uninstall the agent. The agent update is set automatically. The 'established' sessions are there, so the communication seems ok on network layers. The "failed to connect" may be caused by the application layer failure. We don't like to uninstall and re-install the agent instead of finding what caused the failure, otherwise we may have to repeat this again in the future agent update. Anyway we can turn on the debug mode on both sides to get more info?

You will have to open a support ticket. Debug options are tuned on by dynatrace devops team.

Is there proxy involved in your environment?

Please keep in mind that there are multiple connections from a host to cluster. Linux oneagent has 2 or 3 connections, each representing one module. Then each deep instrumented application has its own connection. So maybe the communication you can see belongs to deeply instrumented processes. (you will have to restart the monitored process to update the instrumentation)

If you restarted the oneagent, check the configuration the agent is starting with. Just check the beginning of the logs/os/ruxitagent_host_* file.

CharlesWu
Organizer

We are not using proxy. We have this problem on all of OS with "

Red Hat Enterprise Linux Server
release 6.9 (Santiago) (kernel 2.6.32-696.16.1.el6.x86_64)

" while RHEL7.5 is fine. The one I focus on is Apache where I can see 2 established connections always there, then once a while I see 2 new dynamic sessions showing up. Does this prove that we are ok in network connectivity? I will try to restart Apache as well as OneAgent again to see what happens on these established connections.

Julius_Loman
Leader

Don't know how many instrumented processed you have, but if it's possible, I'd recommend to restart all processes or reboot the system completely.