■About various timeouts of JavaAgent. (ClassicAgent)
JavaAgent has the following three timeout settings.
⇒This is the timeout value for the process of confirming arrival at the collector port, which is always performed before each socket communication. (Default: 20 seconds)
⇒Timeout for each socket communication. (Default: 30 seconds)
⇒This is the frequency of reconnection when it is not possible to connect to the collector. (Default: 10 seconds)
I did the following validation to check the wait parameter on JavaAgent of AppMonAgent platform.
■ Verification details
Start JavaAgent (Tomcat) with the collector server stopped.
The following message is written in the log, and it takes 20 seconds before the next message is written.
2019-11-13 01:05:06 [60d2b8d1] info [native] Trying to connect to Server/Collector for up to 20 seconds
2019-11-13 01:05:26 [60d2b8d1] severe [native] Exception while connecting to Server/Collector 10.253.106.90, info:<receiveExact() ... not connected, 70007, Connection timed out>
I can see that it is doing a heartbeat, but I didn't know if the wait timeout was working.
2020-02-05 04:57:28.741 UTC [7f3c5882] warning [native] Exception sending async messages: Not sent because last heartbeat failed.
2020-02-05 04:57:29.932 UTC  info [java ] [reflector ] Reflector caching mode: on
2020-02-05 04:57:29.940 UTC  info [java ] [reflector ] Reflector access mode: memberIterating
2020-02-05 04:57:29.941 UTC  info [java ] [agent ] Detected Tomcat version : Tomcat 126.96.36.199
2020-02-05 04:57:36.151 UTC [6fff8882] info [native] Exception handling periodic task protocols.messages.shared.ClusterTimeRequest: Not sent because last heartbeat failed.
2020-02-05 04:57:36.154 UTC [6fff8882] info [native] DispatcherPeriodicMessageCenter::sendHeartbeatMessage: Heartbeat failed: Try again.
2020-02-05 04:57:36.172 UTC [6fff8882] info [native] Exception handling periodic task protocols.messages.shared.ClusterRuntimeInfoRequest: Not sent because last heartbeat failed.
2020-02-05 04:57:36.219 UTC [6fff8882] info [native] Exception handling periodic task protocols.messages.configuration.ConfigurationRequest: Not sent because last heartbeat failed.
2020-02-05 04:57:46.156 UTC [6fff8882] info [native] DispatcherPeriodicMessageCenter::sendHeartbeatMessage: Heartbeat failed: Try again.
2020-02-05 04:57:46.229 UTC [6fff8882] info [native] Exception handling periodic task protocols.messages.plugins.PluginsUpdateRequest: Not sent because last heartbeat failed.
2020-02-05 04:57:56.157 UTC [6fff8882] info [native] DispatcherPeriodicMessageCenter::sendHeartbeatMessage: Heartbeat failed: Try again.
2020-02-05 04:58:26.089 UTC [5cdfd882] info [native] ... last message repeated 2 times ...
The WAIT value defines how long the native agent waits for the initial connection to the collector. If this time value expires without establishing a connection to the collector, then the agent continues without performing any instrumentation. This value is only used during the initial connection to the collector. Once a connection is made, then the CTIMEOUT and SOTIMEOUT parameters are relevant.
CTIMEOUT is used only for Java agent.
SOTIMEOUT defines the native agent's socket timeout for sending/receiving data in seconds. This is relavant during communication when the connection is lost to the collector, but only after it was initially successful.
If WAIT expires without a connection, SOTIMEOUT is not relevant.
Thank you for your reply.
I understand the timeout you described.
Increasing the log level shows that each timeout is working.
2020-02-13 04:19:53 UTC [c16b1810] info [native] Trying to connect to Collector for up to 20 seconds
2020-02-13 04:19:53 UTC [c16b1810] fine [native] util::SocketConnection::connect() ... set socket timeout to 10000000 microseconds
2020-02-13 04:19:53 UTC [c16b1810] fine [native] util::SocketConnection::connect() ... set socket timeout to 30000000 microseconds
I expect the AppMonAgent to be processed in the following order:
1. "libdtagent.so" is called according to the Java startup options.
2. "liboneagentloader.so" is called from "libdtagent.so".
3. "liboneagentjava.so" is called from "liboneagentloader.so".
4. JavaAgent of AppMonAgent platform starts.
"1." and "2." are ClassicAgent.
If you increase the log level, the log described above will be output.
(Log under /opt/dynatrace-7.2/log)
"3." and "4." are AppMonAgent.
Even if the log level is increased, the log described above is not output.(Log under /opt/dynatrace-7.2/agent/downloads/one/log/java)
So, I don't know that the timeout parameter works in AppMonAgent.
I want to know if the timeout parameter is valid in AppMonAgent.
I'm a bit confused as to your question. Is there something in the product that's not working correctly? Does your agent not disconnect/reconnect properly when Collector connections are lost/recovered? In general, these parameters should not be modified or set to non-default values unless directed by Dynatrace support.
The context for my question is as the following.
Due to a change in the customer environment, communication between the agent and the collector became abnormal, and a socket timeout occurred for the agent.
As a result, the AP server failed to start because the startup timeout of the AP server itself was exceeded.
Therefore, we plan to adjust the timeout parameter of the agent so that the AP server can be started even if the same event occurs.
This is where we will get into the main topic.
ClassicAgent is running in the above environment.
However, due to another problem, we have to change to AppMonAgent.
Therefore, we are investigating whether various timeout parameters are valid on the AppMonAgent platform.
It sounds like JVM startup is the area of concern, not connection failure after startup. These three parameters existed long before the new agent technology so i'm confident they're in the classic agent. I don't know if the two timeout values were removed for AppMon agent, but WAIT is definitely still there in both agents.
I would suggest you open a support case or reach out to DynatraceOne team for further clarification of this detail.