We have a Java application in production which uses a lot more heap memory after we activated the agent. I'm talking about an increase of nearly 20% as compared to before (without an agent loaded). The result is a totally different GC behavior and increase in suspension times (we use CMS GC).
After reading the comments to the somewhat related question
the observed behavior seems not normal at all.
Is it possible for the agent to create such a significant memory overhead? We have reviewed and tested the instrumentation and we came to the conclusion that it's production-ready based on some load tests done to the best of our capabilities (average Purepath size < 100, no increase in response time, memory usage was normal). It's possible that the load in production is qualitatively different and even higher than during testing, but still, I think a 20% memory increase after agent activation is beyond anything I would have expected.
We are thinking about taking heap memory snapshots. Is there anything we should look out indicating at abnormal memory allocated by the agent?
I would be grateful for any explanations and hints for troubleshooting the problem.
We use Dynatrace version 220.127.116.1171.
Solved! Go to Solution.
Is this agent loaded with OOTB settings? If not some, initial things to look at regarding high heap:
It's pretty much OOTB, with even some default sensor packs unplaced. There is only one custom method sensor placed and active for a legacy CORBA stub class.
The auto-sensors use default resolution.
We have scheduled a leak analysis snapshot for later this weekend (shortly before the scheduled auto-restart). The application is still responsive and can/will be used until then. The memory usage remains more or less stable but on an unusually high level (causing lots of concurrent GC collections and thus increased CPU consumption) after we disabled all event capturing on the agent.
I can't really speak to much on this... seems peculiar. The snapshot analysis should provide you with some pretty solid information, especially after a garbage collection, in terms of what is being retained in memory. I don't know if it's related to the technologies that are being used or a setting, but all of your settings seem all fine.
I would open up a support ticket but please share your results. 😄
After analyzing the memory snapshot we have most likely found the root cause: Due to the way we enable the Dynatrace agent on our platform a previously configured (different) Java agent was no longer loaded (e.g. the VM option -javaagent was simply replaced by -agentpath). The purpose of this agent was to clean up unused objects in the web service stack (a workaround for a bug in some framework API AFAIK).
The final confirmation is pending as we need to restart the application and make sure both agents are loaded and active.
The reason we didn't catch that behavior during testing was most likely due to a different load envelope (much less backend web service calls than in production).
Lesson learned 🙂
thanks for letting us know. would you also be willing to share details on that 3rd party agent you loaded to clean up these objects? with that other users that use the same agent will know what to look for
It's not a 3rd party agent. It's a custom "patch agent" which fixes a memory leak in our internal generic web service stack. No idea why it was decided to use the agent (instrumentation?) API in order to patch some buggy code. I'd have to ask our platform engineers....
Hope that helps 🙂