During a disaster recovery test, we have stopped our Dynatrace Managed Cluster. As a result we encountered issues on all hosts with OneAgents installed. All processes showed a lag and in the various logs we could see connection attempts to the ActiveGates and Dynatrace Managed Cluster. As a consequence we decided to stopp all OneAgents. However, the processes continue logging and trying to connect to the ActiveGates and the Dynatrace Managed Cluster.
Has somebody experienced a simlilar behaviour and has a solution?
What kind of issues you encountered ? You can only expect to miss some data points when downtime of server is longer. Connection attempts are expected.
If you just stop oneagent processes, injected agents inside app processes keeps running until app process restart.
Thank you for this information. We already guessed this.
Currently we use Nagios in parallel. We discovered the behaviour when the checks of Nagios for the processes of the application resulted in timeouts. So it appears this has a negative effect. We are not sure if our application users also experience lag.
So the questions are:
- Why does this affect the checks made by Nagios?
- Do the application users also experience delays?
- How can we prevent this from happening?
Not sure, what your Nagios really performs as process check. Can you post more details ? App users are not expected to experience any lags when Dynatrace server is down.
Nagios executes a Python script which issues the following for the Java-Processes:
java -cp /opt/sense/server/communitynode_tomcat/lib/sensecommonlib/h2-1.4.197.jar org.h2.tools.Shell -url "jdbc:h2:file:/opt/sense/atnaCache/cn/logging;AUTO_SERVER=TRUE" -user sense -password sense -sql "select * from public.messages"
Ok, this looks like check of other tool's log messages (from h2 db). Did timeouts appear as messages selected from h2 ? I'm not aware of tool named sense, have you contacted Dynatrace support about this ?
Yes we currently have a case open with Dynatrace Support.
I think we have to exclude all Nagios processes from monitoring.
This is about tool named sense, not Nagios, which seems to just read messages from already logged in H2 DB: