Re: Stopped OneAgent processes keep logging and trying connecting to ActiveGates and Dynatrace Managed Cluster

patmis · ‎21 Jan 2022

During a disaster recovery test, we have stopped our Dynatrace Managed Cluster. As a result we encountered issues on all hosts with OneAgents installed. All processes showed a lag and in the various logs we could see connection attempts to the ActiveGates and Dynatrace Managed Cluster. As a consequence we decided to stopp all OneAgents. However, the processes continue logging and trying to connect to the ActiveGates and the Dynatrace Managed Cluster.

Has somebody experienced a simlilar behaviour and has a solution?

rastislav_danis · ‎21 Jan 2022

What kind of issues you encountered ? You can only expect to miss some data points when downtime of server is longer. Connection attempts are expected.

If you just stop oneagent processes, injected agents inside app processes keeps running until app process restart.

Alanata a.s.

patmis · ‎21 Jan 2022

Thank you for this information. We already guessed this.
Currently we use Nagios in parallel. We discovered the behaviour when the checks of Nagios for the processes of the application resulted in timeouts. So it appears this has a negative effect. We are not sure if our application users also experience lag.

So the questions are:

- Why does this affect the checks made by Nagios?

- Do the application users also experience delays?

- How can we prevent this from happening?

rastislav_danis · ‎21 Jan 2022

Not sure, what your Nagios really performs as process check. Can you post more details ? App users are not expected to experience any lags when Dynatrace server is down.

Alanata a.s.

patmis · ‎21 Jan 2022

Nagios executes a Python script which issues the following for the Java-Processes:

java -cp /opt/sense/server/communitynode_tomcat/lib/sensecommonlib/h2-1.4.197.jar org.h2.tools.Shell -url "jdbc:h2:file:/opt/sense/atnaCache/cn/logging;AUTO_SERVER=TRUE" -user sense -password sense -sql "select * from public.messages"

rastislav_danis · ‎21 Jan 2022

Ok, this looks like check of other tool's log messages (from h2 db). Did timeouts appear as messages selected from h2 ? I'm not aware of tool named sense, have you contacted Dynatrace support about this ?

Alanata a.s.

patmis · ‎21 Jan 2022

Hi rastislav_danis.

Yes we currently have a case open with Dynatrace Support.

patmis · ‎21 Jan 2022

I think we have to exclude all Nagios processes from monitoring.

rastislav_danis · ‎21 Jan 2022

This is about tool named sense, not Nagios, which seems to just read messages from already logged in H2 DB:

/opt/sense/atnaCache/cn/logging

Alanata a.s.