cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Instrumentation disabled since the agent could not connect during startup

rajanikanthanka
Inactive

Hi,

Two Java agents from agents overview are showing as "Instrumentation disabled since the agent could not connect during startup". This is production environment.
I am seeing below info in logs

2017-06-13 05:45:08 [aa7bac15] info [native] Control channel connected successfully

2017-06-13 05:45:08 [aa7bac15] info [native] Transaction sampling started.

2017-06-13 05:45:08 [aa7bac15] info [native] Process metrics pattern ..... not set

2017-06-13 05:45:08 [aa7bac15] info [native] Agent ID .................... f3e1d15c

2017-06-13 05:45:08 [aa7bac15] info [native] Process ID .................. 18299

2017-06-13 05:45:08 [aa7bac15] info [native] Hot Sensor Placement ........ not available

2017-06-13 05:45:08 [aa7bac15] info [native] Hypervisor .................. VMware

2017-06-13 05:45:08 [aa5b5c15] info [native] Sampling Cache (0 methods) cleared ...

2017-06-13 05:45:20 [aa7bac15] info [native] License = license ok;
2017-06-13 09:26:16 [aa7bac15] info [native] No special logfiles are harvested because the java agent is not present.

Previously I saw all agents connected status but after restart it shows as "invalid instrumentationstate".

Thanks,

Rajani

11 REPLIES 11

martin_chochke1
Advisor

could you include more of the log files? Is there anything mentioning the words "instrumentation disabled" in the log file?

Hi Martin,

Including here complete log file. I didn't see anything as instrumentation disabled.
agw-log-file.txt

This is the relevant section - it was unable to connect to the collector specified at startup:

2017-06-13 05:44:21 [7208cc15] info    [native] Trying to connect to Server/Collector for up to 20 seconds
2017-06-13 05:44:31 [7208cc15] info [native] Could not retrieve LoadBalancing configuration from 192.168.178.118:9999 (70007): The timeout specified has expired
2017-06-13 05:44:31 [7208cc15] info [native] Reading LoadBalancing configuration from 172.27.16.223:9999
2017-06-13 05:44:31 [7208cc15] info [native] Storing collector peer list (2 entries) to /opt/dynatrace/dynatrace-6.3/agent/conf/collectorlist.AGW_Java_Prod
2017-06-13 05:44:31 [7208cc15] info [native] Server/Collector ............ 192.168.178.118:9999|172.27.16.223:9999|dca-pro1013b2.dteco.com:9999|dca-pro1013.dteco.com:9999
2017-06-13 05:44:31 [7208cc15] info [native] Server/Collector ............ dca-pro1093.dteco.com:9998|162.9.163.4:9998|dca-pro1093b2.dteco.com:9998|172.27.17.185:9998
2017-06-13 05:44:31 [7208cc15] info [native] Server/Collector ............ dca-pro1093.dteco.com:9998
2017-06-13 05:44:41 [7208cc15] severe [native] Exception while connecting to Server/Collector 192.168.178.118, info:<connect()/apr_socket_connect(), 70007, Connection timed out>
2017-06-13 05:44:41 [7208cc15] warning [native] Unable to register with Server/Collector dca-pro1093.dteco.com:9998, CONTINUING WITHOUT INSTRUMENTATION.

Hi James,

But in our JVM startup file we defined as
DYNATRACE_OPTIONS="-agentpath:/opt/dynatrace/dynatrace-6.3/agent/lib64/libdtagent.so=name=AGW_Java_Prod,server=dca-pro1093.dteco.com:9998,sotimeout=300,wait=300,ctimeout=300"
JAVA_OPTIONS="${JAVA_OPTIONS} ${DYNATRACE_OPTIONS}"

for connection issues still its not connecting.

It is trying to connect to that server but it looks like it took longer than the timeout that it waits for (I believe the default is 20 seconds) so it had to continue without instrumentation to avoid preventing the application from starting up. Eventually it looks like it connected but this was after the timeout. This may have been a temporary issue but if it recurs (possibly due to the agent being located too far from the collector or other connectivity issues) you may need to make changes by either moving the collector closer to the agent or increasing this timeout (be careful here though as this will be delaying the application from starting up so only make a change here if absolutely necessary).

See the 'wait' parameter on this page. The only way to resume instrumentation at the moment is to restart the application.

2017-06-13 05:44:41 [7208cc15] severe [native] Exception while connecting to Server/Collector 192.168.178.118, info:<connect()/apr_socket_connect(), 70007, Connection timed out>
2017-06-13 05:44:41 [7208cc15] warning [native] Unable to register with Server/Collector dca-pro1093.dteco.com:9998, CONTINUING WITHOUT INSTRUMENTATION.
2017-06-13 05:44:41 [7208cc15] info [native] Platform .................... Linux 2.6.32-431.29.2.el6.x86_64, amd64
2017-06-13 05:45:08 [8dc26c15] info [native] Instrumentation channel connected successfully
2017-06-13 05:45:08 [8dc26c15] info [native] Connected to Server/Collector 172.27.17.185:9998

We already increased the timeout to 300sec long back. But why agent tried only for first 20secs and then continued without instrumentation? Why it didn't tried connecting for 300secs?

This is the agent path that is shown in the logs:

-agentpath:/opt/dynatrace/dynatrace-6.3/agent/lib64/libdtagent.so=name=AGW_Java_Prod,server=dca-pro1093.dteco.com:9998 weblogic.Server

It does not show the values for wait and the other timeouts that you posted in your comment. Are you sure you're using the correct configuration string? If you need to look more into this you may want to open a support case.

I checked with Middle Tier group usually they add agent path in startup file. But may be I will open a ticket to look more deep inside and i will share results. Thanks James.

martin_chochke1
Advisor

If it's an option, you can fix this by disabling collector groups since the timeout is occurring during the load balancing.

bernardo_varand
Inactive

Is there a solution about this case? I have the same problem with a lot of agents (.NET and JAVA). The instrumentation was disable and the message is "invalid instrumentationstate".

Hi Bernardo,

In my situation it was because the agents are in different Datacenter and collectors were in different Datacenter. Though they were in collector groups I had to point the agent conf to the same data center collector and restart. That fixed it.

Thanks,
Rajani