This questions is similar to this other one but with a slight difference. As you can see below we have a collector group made up of 5 collectors, and a WebLogic agent trying to start up. It connects to a collector on port 9998 and it receives the list of collectors from it. It then tries to connect to a different collector instance on the same host (on port 9997), and is apparently unable to connect after 20 seconds. The strange part is that it then it says "Unable to register with Server/Collector" and that it continues without instrumentation. Then, 3 seconds later it says that the "instrumentation channel connected successfully", when connected to the original collector on port 9998. Unfortunately this agent appears as connected, but with "Instrumentation disabled since the agent could not connect during startup".
2016-06-15 02:25:32 [54dc9886] info [native] Detected application server: BEA/Oracle WebLogic
2016-06-15 02:25:32 [54dc9886] info [native] Reading LoadBalancing configuration from V8KSHDYN:9998
2016-06-15 02:25:32 [54dc9886] info [native] Storing collector peer list (5 entries) to /aplicaciones/dynatrace62/app/agent/conf/collectorlist.gidb_sliro1215_WL_B
2016-06-15 02:25:32 [54dc9886] info [native] Trying to connect to Server/Collector for up to 20 seconds
2016-06-15 02:26:02 [54dc9886] severe [native] Exception while connecting to collector, info:<receiveExact() ... not connected, 70007, Connection timed out>
2016-06-15 02:26:02 [54dc9886] warning [native] Unable to register with Server/Collector 10.64.221.95:9997, CONTINUING WITHOUT INSTRUMENTATION.
2016-06-15 02:26:02 [54dc9886] info [native] Platform .................... Linux 2.6.32-358.6.2.el6.x86_64, amd64
2016-06-15 02:29:22 [0d210886] info [native] Instrumentation channel connected successfully
2016-06-15 02:29:22 [0d210886] info [native] Connected to Server/Collector 10.64.221.95:9998
2016-06-15 02:29:22 [d1666885] info [native] Control channel connected successfully
2016-06-15 02:29:22 [d1666885] info [native] Agent ID .................... 366fe815
2016-06-15 02:29:22 [d1666885] info [native] Process ID .................. 13957
2016-06-15 02:29:22 [d1666885] info [native] Hot Sensor Placement ........ not available
2016-06-15 02:29:22 [d1666885] info [native] Hypervisor .................. unknown
2016-06-15 02:29:32 [abff8885] info [native] Eventsender channel connected successfully
2016-06-15 02:29:32 [abff8885] info [native] Event channel connected successfully
2016-06-15 02:29:32 [d2065885] info [native] Sampling Cache (0 methods) cleared ...
2016-06-15 02:29:43 [d1666885] info [native] License = license ok;
Can anyone shed some light as to what is happening, and why it's not able to be instrumented correctly? Very much appreciated!
It's possible that it just needs a bit more time to connect and at the same time it's actually starting to connect, the timeout expires. Try increasing the waittime for the connection using the wait=xx parameter.
Hi Joseph H.,
I am getting same error as mentioned in above logs. Also i used wait parameter during application agent instrumentation.
But still after application restart it's not instrumented, again and again we have to ask application team for restart of application.
Logs: Exception while connecting to Server/Collector x.x.x.43, info:<receiveExact() ... error reading, 70007, Connection timed out>
Ashutosh Kumar Singh
Hello @Ashutosh S.
First of all make sure ports are open from the application to the collectors and if there is a firewall between application and collector then also try to increase the waittime for the connection using the wait=xx parameter which specifies the initial wait timeout — the maximum time to wait for a connection to an AppMon Collector in seconds. If the connection cannot be established within this timeframe, the application continues uninstrumented.
As suggested, Firewall ports are open from application to collector and kept wait time 30 seconds in agent instrumented parameter. But it is happening with every restart of application service and we have to request for one more restart to connect agent.
Exception while connecting to Server/Collector 10.4.132.43, info:<receiveExact() ... error reading, 70007, Connection timed out>
I ll try to increase wait time. But is there any other workaround for same, kindly let me know.
Ashutosh Kumar Singh
Hello @Ashutosh S.
A firewall introduces latency in the calls between the Agent and Collector. This is often the reason for slow application start-up. The Agent needs to do several 10,000 round trips to the Collector at application start up. Even 1 ms firewall latency adds up to a noticeable time. Therefore, either use a real fast (in latency time) firewall or put the Collector into the same subnet as the Agents.
Have a look on the below link for the collector best practices.