We have a set of JBOSS services that all run on the same set of linux machines. There are 8 machines in total and each of them host around 8 services. For a long time now we have had custom JMX measures that run against all of those services. What has changed recently is that the CPU on these machines started to spike during load testing. After doing a CPU sample we see that the following seems to be to blame.
The getStackAccessControlContext() method is taking a lot of time on the CPU. A child of that method is the JMXIntrospection class and we can see that it is taking sometimes 35 seconds of CPU time during a 1 minute window.
Today i started to remove some of the JMX metrics in hopes of finding the problem child.
1.What i want to know is if there is a better way to determine the actual measure causing the overhead?
2. Best practices for creating and using JMX?
3. Any known differences between JMX in AppMon 6.5 vs 6.3?
The attached image shows the stack from a CPU sample done during one of the tests.
This is interesting. We're having similar issues since we updated our JBoss EAP from 6.4.7 to 6.4.13 on RHEL6. What version do you use?
We haven't done any CPU sampling yet as the problem is fairly new and seems to be intermittent but the symptoms we're experiencing are transient drop-outs in the JMX measure series and correlating CPU spikes. In our case we only have one JMX custom measure for retrieving the number of active sessions.
I will try to start a CPU sampling session next time the problem occurs and report back.
Unfortunately the obfuscation of the introspection code works quite well in this case and I only can guess what happens here.
One root cause of this problem could be the WebSphere PMI support. The JMX/PMI introspection cyclically checks for a JMX/PMI MBean until one is found. On a JBoss environment such an MBean is not available. Normally this is no issue, but it could become one if MBean queries are slow. You may try to switch off PMI support with the agent option "debugSkipWebSpherePmiCheckJava"
You may also try to add an exclude rule for class "PluggableMBeanServerImpl" to the JMX sensor rules. With this exclude rule, the agent would no longer instrument "PluggableMBeanServerImpl" and as a consequence, the JMX introspection would not call the "queryMBeans" method on it. If you do this, you need to check if you still receive all desired JMX measurements.
If this does not help, please file a support ticket containing at last a support archive and a sampling session showing the problem.