23 Jun 2023 06:40 PM
I have a very unusual evolution of memory in a Windows server. Memory usage is going steadily up, and the culprit is the Windows kernel!
Probably some memory leak? I have checked system logs, and some more metrics, but have been unable to find out what might be causing this rise in memory consumption. I'm also checking at the server level... Any ideas on how to investigate this further?
07 Aug 2023 04:42 PM
Any luck with looking into this deeper @AntonioSousa
07 Aug 2023 04:52 PM
No. We have upgraded the Windows OS, but memory keeps climbing... Since it's at the Kernel level, we don't have much clues. Will be back if we discover a solution.
15 Aug 2023 10:31 AM - edited 15 Aug 2023 12:38 PM
The issue doesn't seem to be with the kernel. Dynatrace indicates that MS SQLSERVER is using 2.6GB of memory, but the actual problem lies in the inconsistent memory reporting by the Windows OS. This inconsistency is visible both in Task Manager and the resource monitor's (working set) as well. We've initiated an internal ticket to thoroughly analyze this matter.
Fortunately, due to the substantial memory usage of this database, we were able to identify the issue with relative ease. This stands in contrast to situations on low-memory servers where identification is more challenging. This inconsistency is impacting the precision of memory analysis in Dynatrace.
15 Aug 2023 11:36 AM
Accounting for memory usage has never been an easy task. Device drivers for instance can do very weird things. And when I discovered "ballooning", than I knew that even the OS didn't have a chance 😉
15 Aug 2023 10:49 AM
A long time ago we had a similar issue, which was triggered by OneAgent, but caused by a bad Fibrechannel driver which had a memory leak when some performance counters were queried. It took quite a while to investigate. Since the driver was out of support, the only workaround was to disable the network monitoring module of OneAgent on such hosts.
As far as I remember, we used the poolmon command to fetch the kernel allocations and then identified the pool and the driver causing it.
15 Aug 2023 11:34 AM
The sysadmin tried poolmon, but was unable to get useful data... We tried several tricks, including installing a new server with the same software. It kept on climbing in an absolute linear form.
But now that I have checked again, the system seams stable for a week now. I'll have to check with him what was done 😉
17 Oct 2024 09:22 AM
What was the solution in this case? We have the same problem on a Windows server and have installed Dynatrace OneAgent for analysis. We can now see that the “kernel memory” is increasing, but we can't see what exactly is going on here.
17 Oct 2024 09:38 AM
In our client's case, it was an anti-virus solution. We didn't clearly understand the trigger, but starting & stopping it would start & stop the ramp up...
These are very specific situations, and the root cause will be difficult to get from Dynatrace. You may have some luck with the event log, but you will probably have to dive in with specific tools. Besides poolmon that @Julius_Loman mentioned, I have also used in these scenarios in the past tools like Process Explorer, WPA and RAMMap, all from Microsoft.
18 Oct 2024 10:20 AM
@AntonioSousa Many Kudos for your answer! We will have a look into the Kernel with the other tool you mentioned (I had already thought of using RAMMap and you have just confirmed it, so i am not on the wrong path i guess).