cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Agents ambiguously mapped and disconnected

Hello,

We have a few agents that keep throwing incidents for being ambiguously mapped. All three agents have a similar pattern in the log files. They either attempt to connect to a collector repeatedly and throw exceptions:

2015-11-03 18:57:39 [00003637] severe [native] Exception in controller: Exception java.nio.BufferUnderflowException. Retrying every 10 seconds.
2015-11-03 18:57:47 [00003738] warning [native] Exception occurred while sending events: sendExactOnce(), 32, There is no process to read data written to a pipe.
2015-11-03 18:57:49 [00003637] warning [native] Control channel connection to tldntcoltr13p:9997 lost. Retrying every 10 seconds.
2015-11-03 18:57:57 [00003738] warning [native] Event channel reconnect failed: Exception while connecting to collector, info:<Could not register agent with collector as instrumentation channel is not yet connected>
2015-11-03 18:58:14 [00003334] warning [native] ... last message repeated 1 time ...
2015-11-03 18:58:14 [00003334] warning [native] Instrumentation channel disconnected: server did not reply to ping request
2015-11-03 18:58:25 [00003334] info [native] Instrumentation channel connected successfully
2015-11-03 18:58:25 [00003334] info [native] Connected to Server/Collector tldntcoltr13p:9997
2015-11-03 18:58:27 [00003738] info [native] Eventsender channel connected successfully
2015-11-03 18:58:27 [00003738] info [native] Event channel connected successfully
2015-11-03 18:58:29 [00003637] info [native] Control channel connected successfully
2015-11-03 18:58:30 [00003637] severe [java ] Exception java.nio.BufferUnderflowException while reading command:
java.nio.BufferUnderflowException
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:265)
at com.dynatrace.diagnostics.agent.j.read(Unknown Source)
at java.io.DataInputStream.readFully(DataInputStream.java:206)
at java.io.DataInputStream.readUTF(DataInputStream.java:620)
at java.io.DataInputStream.readUTF(DataInputStream.java:575)
at com.dynatrace.diagnostics.agent.i.c(Unknown Source)
at com.dynatrace.diagnostics.agent.Controller.i(Unknown Source)
at com.dynatrace.diagnostics.agent.Controller.handleCommand(Unknown Source)
2015-11-03 18:58:30 [00003637] severe [native] Exception in controller: Exception java.nio.BufferUnderflowException. Retrying every 10 seconds.

Or the agents attempt to collect to various other collectors in their collector groups repeatedly.

2015-11-04 03:52:59 [00003637] warning [native] Control channel connection to tldntcoltr15p:9998 lost. Retrying every 10 seconds.
2015-11-04 03:53:05 [00003334] warning [native] Instrumentation channel disconnected: server did not reply to ping request
2015-11-04 03:53:12 [00003738] warning [native] Event channel reconnect failed: Exception while connecting to collector, info:<connect()/apr_socket_connect(), 70007, Connection timed out>
2015-11-04 03:54:55 [00003334] info [native] Instrumentation channel connected successfully
2015-11-04 03:54:55 [00003334] info [native] Connected to Server/Collector tldntcoltr14p:9998
2015-11-04 03:54:55 [00003637] info [native] Control channel connected successfully
2015-11-04 03:54:55 [00003738] info [native] Eventsender channel connected successfully
2015-11-04 03:54:55 [00003738] info [native] Event channel connected successfully
2015-11-04 03:54:55 [00003637] info [native] Agent ID .................... 21892727
2015-11-04 03:54:55 [00003637] info [native] Process ID .................. 5439760
2015-11-04 03:54:55 [00003637] info [native] Capture ..................... disabled
2015-11-04 03:54:55 [00003637] info [native] License ..................... not licensed
2015-11-04 03:54:55 [00003637] info [native] Capture CPU times ........... disabled
2015-11-04 03:54:55 [00003637] info [native] Hot Sensor Placement ........ not available
2015-11-04 03:54:55 [00003536] info [native] Sampling Cache (0 methods) cleared ...
2015-11-04 03:55:03 [00003637] info [native] License = skipped by license check - agent did not match system profile;
2015-11-04 04:06:43 [00003637] severe [native] Exception in controller: util::SocketConnection::receiveExact(), 70014, End of file found. Retrying every 10 seconds.
2015-11-04 04:06:56 [00003334] warning [native] Instrumentation channel disconnected: server did not reply to ping request
2015-11-04 04:07:01 [00003738] warning [native] Exception occurred while sending events: sendExactOnce(), 32, There is no process to read data written to a pipe.
2015-11-04 04:07:03 [00003637] warning [native] Control channel connection to tldntcoltr14p:9998 lost. Retrying every 10 seconds.
2015-11-04 04:08:46 [00003334] info [native] Instrumentation channel connected successfully
2015-11-04 04:08:46 [00003334] info [native] Connected to Server/Collector tldntcoltr12p:9997
2015-11-04 04:08:46 [00003637] info [native] Control channel connected successfully
2015-11-04 04:08:46 [00003738] info [native] Eventsender channel connected successfully
2015-11-04 04:08:46 [00003738] info [native] Event channel connected successfully
2015-11-04 04:08:46 [00003637] info [native] Agent ID .................... 21892727
2015-11-04 04:08:46 [00003637] info [native] Process ID .................. 5439760
2015-11-04 04:08:46 [00003637] info [native] Capture ..................... disabled
2015-11-04 04:08:46 [00003637] info [native] License ..................... not licensed
2015-11-04 04:08:46 [00003637] info [native] Capture CPU times ........... disabled
2015-11-04 04:08:46 [00003637] info [native] Hot Sensor Placement ........ not available
2015-11-04 04:08:46 [00003536] info [native] Sampling Cache (0 methods) cleared ...
2015-11-04 04:08:56 [00003637] info [native] License = skipped by license check - agent did not match system profile;
2015-11-05 03:47:24 [00003637] severe [native] Exception in controller: util::SocketConnection::receiveExact(), 70014, End of file found. Retrying every 10 seconds.
2015-11-05 03:47:42 [00003738] warning [native] Exception occurred while sending events: sendExactOnce(), 32, There is no process to read data written to a pipe.
2015-11-05 03:47:44 [00003637] warning [native] Control channel connection to tldntcoltr12p:9997 lost. Retrying every 10 seconds.
2015-11-05 03:47:47 [00003334] warning [native] Instrumentation channel disconnected: server did not reply to ping request
2015-11-05 03:49:37 [00003334] info [native] Instrumentation channel connected successfully
2015-11-05 03:49:37 [00003334] info [native] Connected to Server/Collector teesmawds02p:9998
2015-11-05 03:49:37 [00003637] info [native] Control channel connected successfully
2015-11-05 03:49:37 [00003738] info [native] Eventsender channel connected successfully
2015-11-05 03:49:37 [00003738] info [native] Event channel connected successfully
2015-11-05 03:49:37 [00003637] info [native] Agent ID .................... 21892727
2015-11-05 03:49:37 [00003637] info [native] Process ID .................. 5439760
2015-11-05 03:49:37 [00003637] info [native] Capture ..................... disabled
2015-11-05 03:49:37 [00003637] info [native] License ..................... not licensed
2015-11-05 03:49:37 [00003637] info [native] Capture CPU times ........... disabled
2015-11-05 03:49:37 [00003637] info [native] Hot Sensor Placement ........ not available
2015-11-05 03:49:37 [00003536] info [native] Sampling Cache (0 methods) cleared ...
2015-11-05 03:49:45 [00003637] info [native] License = skipped by license check - agent did not match system profile;
2015-11-05 08:41:10 [00003637] info [native] Sending log files (including previous)

Some agents say they didn't match a system profile, but one just says agent disconnected:

6 REPLIES 6

Also the one that says "Agent is disconnected" still appears on the transaction flow, and has host/health info.

graeme_william1
Inactive

Jacob,

This may require submitting a support ticket, but first I'd check a few things, like the ping time between the agents and the collector, and whether the collector is overloaded. You can check the collector in the Collector Health dashboard available in the Start Center -> Monitoring, but I'd also check the system it's running on for memory and CPU utilization.

-- Graeme

prudhviraj_koni
Newcomer

We are facing the same problem in our env. kindly share the findings on this issue.

Hello Prudhvi,

Please add the following column in the Agents Overview Dashboard and check the average time of instrumentation for those agents.

Open the start center the select the Monitoring then select the dynaTrace collector sizing to check the Buffer Saturation, CPU and Memory Usage.

Regards,

Babar

rajeshwar_vadhe
Participant

Babar, I did as mentioned above. I see Buffer Saturation graph showing nothing and CPU and Memory are okay. However, my deployment size is Medium caters 250 agents per documentation. but I have 300 agents connected. Is that the reason?

Hello Raj,

I believe this is just a count but the number of transactions per second also matters in the sizing.

Do you have any firewall between agents and collectors?

A firewall introduces latency in the calls between the Agent and Collector. This is often the reason for slow application start-up. The Agent needs to do several 10,000 round trips to the Collector at application start up. Even 1 ms firewall latency adds up to a noticeable time. Therefore, either use a real fast (in latency time) firewall or put the Collector into the same subnet as the Agents.

I would recommend you to have a look on the below link for the collector best practices.

https://www.dynatrace.com/support/doc/appmon/installation/deployment-guide/additional-deployment-best-practices/collector-best-practices/

Regards,

Babar