16 Nov 2021 02:26 PM - last edited on 31 Aug 2022 03:44 AM by MaciejNeumann
Trying to see if there is a way to alert for java heap memory issues, specifically the message below. I originally tried doing something such as taking the first metric below and dividing it by the 2nd then multiplying the result by 100 to get the "percentage of java heap used". It seems this java heap space error message is logged even when the percentage is not at 100%?
We are not using dynatrace log analytics so looking for that error message is not an option. When we get this heap memory error in the application logs, we want to somehow see that also in dynatrace.
Something unexpected happened; the data (if any) is <null> and the exception (if any) is java.lang.OutOfMemoryError: Java heap space
Solved! Go to Solution.
This is off the top of my head, but would this work?
((builtin:tech.jvm.memory.pool.used:filter(and(eq(poolname,"Eden Space"))))/(builtin:tech.jvm.memory.pool.committed:filter(and(eq(poolname,"Eden Space")))))*(100)
Go to settings > anomaly detection > custom events for alerting.
Flick to code tab and paste above into box. Hit `TAB` to validate.
Then set your static threshold below.
It is better to use the max value of the pool, instead of the committed.
See also the metrics expression API
Why is it better to use the max value? There is no guarantee that memory up to that value can be committed for the JVM. It's basically saying, "you will get no more than max value". It's not saying, "you will for sure get the max value". What committed is instead displaying is the guaranteed amount of memory the JVM does for sure have. So in that sense it would make that the better metric to follow.
Let's say you have tons of JVMs on a server which is running out of RAM capacity. If you follow the committed value, you'd see that the usage is stuck at 100 % and you'd get an alert about it. If you follow the max value, everything appears to be working fine since the used value is still not close to max possible value. But the JVMs are already out of memory by then.
Depends on what you want to achieve. If you want it for alerting for committed memory, this is perfectly fine. But be aware of the committed memory changes over time (increase / decrease). So the basis for the percentage calculation will change. This will cause jumps in you charts then. Usually committed memory starts low and the JVM increases it when a certain highwater mark is reached. This is repeated till the max memory is reached (XMX setting)
If you exactly know how much memory your application needs, you can set XMS to the same XMX value, so everything is reserved at startup.
Yeah, that's a good point. Maybe for alerting purposes the committed memory is more important, and for charting, it's more interesting to follow how close the used memory goes in relation to the max value.
By the way, the out-of-the-box graphs at Dynatrace UI show only the used and committed memory. Max values are not shown at all, you have to create your own data explorer chart to see that.
Also for alerting be aware that is usual that used mem is the same as committed mem, till the next chunk can be reserved. So you might get alerts when actually the JVM is healthy.
I've now had the custom event configured for a couple of days which alerts when heap space usage (used/committed mem) is over 97 % for 10 minutes. So far zero alerts, so I think it's working as intended.
What is your custom event looking for? Wondering if its the same as what I am trying to accomplish or something different.
It's basically trying to detect if we're unable to allocate more memory for the JVM. It could be due to hitting the max value, it could be due to running out of memory resources on the host, but nonetheless it's something worth investigating so that the JVM doesn't run out of memory. That's pretty much it. Is that also your scenario?
Hello, yes I believe that would work for my situation. I'd have to get back with our application team who was having the issue. Are you able to supply the code portion of this within the custom event? Or if you used the build section, what that looks like.
Sure, I can post it here. There are two filters used, one for including only specific Java processes tagged "WAS", and another one for the poolname "Java heap".
(builtin:tech.jvm.memory.pool.used:filter(and(in("dt.entity.process_group_instance",entitySelector("type(process_group_instance),tag(~"WAS~")")),eq(poolname,"Java heap"))))/(builtin:tech.jvm.memory.pool.committed:filter(and(in("dt.entity.process_group_instance",entitySelector("type(process_group_instance),tag(~"WAS~")")),eq(poolname,"Java heap"))))*(100)
Not sure if I understand the question correctly, but this should happen out of the box.
There is a built-in alert (Settings / Anomaly detection / Infrastructure) "Detect Java out-of memory problem" which should create a problem every time an out-of-memory exception is being detected by the OneAgent.
OutOfMemoryError is usually too late. When you have an aleart earlier, you have a chance to gracefully stop the JVM and than restart it.