on 12 May 2023 09:55 AM - edited on 12 May 2023 11:00 AM by MaciejNeumann
While it might be true that a lot of dumps are being created, it might not be happening because of OneAgent. In fact, OneAgent does not create any memory dumps but the underlying kernel/operating system does.
Whenever a process crashes on a machine the kernel creates a memory dump for you to analyze. Every operating system has a standard routine to create and store memory dumps and your host will create a memory dump of crashing applications no matter what; with or without OneAgent.
Oneagent simply analyzes the created dumps by the kernel to check, if any of its own processes were reasons for crashing. Once OneAgent has analyzed the dump, OneAgent calls the actual dump handling routine on the system and passes the control to it.
This is true that installation of OneAgent on Linux, changes the dump handling routine of the system but only to the extent that it can analyze, and after analyzing the actual routine can be called.
During installation contents of /proc/sys/kernel/core_pattern are written to /opt/dynatrace/oneagent/agent/conf/original_core_pattern
https://www.dynatrace.com/support/help/shortlink/crash-analysis#linux-core-dump-handling
Troubleshooting steps:
If you do not want OneAgent to analyze the memory dumps created by the kernel then while installing OneAgent you can pass the following parameter.
On New Installations:
--set-dump-capture-enabled=false
On Existing Installation:
sudo /opt/dynatrace/oneagent/agent/tools/lib64/oneagentctl --set-dump-capture-enabled=false
Dump capturing will be disabled but the change will take effect during the next update of the OneAgent.
Another workaround on the existing installation will be to read the contents of /opt/dynatrace/oneagent/agent/conf/original_core_pattern and paste in /proc/sys/kernel/core_pattern after removing the existing contents. Note that this workaround only works until the next update of OneAgent.
Any other method of disabling the analysis of memory dumps by OneAgent will have an adverse effect on the memory handling routine of the system depending on what method was chosen to disable the analysis.
Great Write up, many times an incorrect setting can cause this effect and its great that you have documented it.
Thanks for the cool article. In our case though, it is oneagentnetwork that is creating core dumps. This has started happening with 1.297.53.20240820-121511 or 1.295.70.20240820-160945 versions of OneAgent update. We have case open with DT Support. But, it will be good to understand what we do in such case.
file core.309
core.309: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'oneagentnetwork -Dcom.compuware.apm.WatchDogTimeout=900 -Dcom.compuware.apm.Wat', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: 'oneagentnetwork', platform: 'x86_64'
The error from system messages,
Aug 31 22:28:28 <HOSTNAME> kernel: traps: oneagentnetwork[309] trap divide error ip:55d0515de327 sp:7ffe432763e8 error:0 in oneagentnetwork[55d051578000+2fe000]
@DatPuddn We have been working with DT support after creating a request. You can consider doing the same. You can share details from your crash dumps and support will analyse the same and advice of next steps.