I'm looking for clues on how to diagnose this problem.
We have a farily large installation with up to 4 collector servers, monitoring 60+ JVMs in production. It's been working fine, until recently.
Without any change in profile or agent group configuraiton, yesterday, we rebooted our collectors to increase a partition size. And now, we noticed that DynaTrace has dropped the collection of almost all PurePaths in our infrastructure. I'm not sure if the partition resize or the reboot of all collectors is related, but I need help diagnosing what's happenning.
Agents Overview shows all agents are conencted and happy. We tried restarting collector services, restarting dynaTrace server, and even restarting some applications. We can see a few PurePaths collected, but overall monitoring dropped from 44 thousand requests per minute to a mere 60.
Thank you, in advance for any help.
Solved! Go to Solution.
I guess this issue first time posted on the community. and it would be better to open a support ticket parallel while we try to fix here.
Did you increase the partition size of drives where collectors were installed on all the servers?
What is the status of Events Count, Class Loaded, PurePaths in the agents overview dashboard?
Read the server and collector log and you can also share here.
We were using 18.104.22.1685, and as desperate attempt, we decided to apply the latest upgrade 22.214.171.1244.
After the upgrade (restart and collector sync), PurePaths and all events are back to normal. I'm passing this on a possible bug resolved by the uprade.
As a note, we have a dedicated separate installation to test upgrades before moving on to Production. Curiously, this behavior did not happen on our staging environment.
Thank you for the reponse!
I would check the logs for this period as suggested and see if there is any obvious issue and would definitely recommend opening a support case in case this is something that needs to be looked in to. Good that it's working now though.
Thanks, James. We checked the logs, but were unable to find anything conclusive. All collectors seemed to be properly communicating with the dynatrace server, and all agents seemed to be properly registered with a lot of sensors deployed to them, as usual.
Something must've gone haywire with the recent upgrade to 6.5.12 and compounded with the reboots from storage partition modifications. Once we upgraded to the newest 6.5.13, the issue magically disappeared.