One of the monitored applications by AppMon has gotten a high CPU Load.
The application owner is very keen to find out the cause by AppMon. Yet AppMon shows with status "OK" and high spike.snapshot.jpg
Please help how to find out(drill down) to the exact problem of such kind of load.
Since the blue line (Instrumented Processes) spikes along with the User CPU and System CPU, it means one or more monitored processes have CPU spikes, so you need to look on the right at the monitored processes and see which ones have spikes. You can then drill into them in more detail. Look at the load for the process, did it spike? If so, is the CPU just symptomatic of more load. If not, use the Reponse Time Hotspots and Method Hotspots to look for methods with high CPU time.
You can always drill into the PurePaths and add the CPU time columns as well and sort by them.
Note that if the CPU comes from methods in threads we are not capturing PurePaths for, you will not see many clues in these dashlets. In that case, you may need to resort to CPU sampling and/or Thread Dumps to see what is consuming the CPU.
There are different ways to reach on the following screenshot level but you can go from the monitoring dashboard overview.
Click on the 'Processes' then right click on your desired process and 'Open'.
Please also keep in mind before this which @Dave M. mentioned in his answer.
The green box by CPU means it is currently "healthy".
The red sections in the green bar above the chart mean it was "unhealthy" for the duration of time with red and "healthy" for the time with green.
These icons are the Eclipse IDE style icons marked for Public (green), protected (yellow) and Private(red) type of classes and methods. Blue M with green circle and tick means that methos is instrumented.
When you took the CPU sample, was the spike still in process or had it ended by then? Did you verify that the traffic was indeed low at the time of the spike? Did you narrow down the CPU spike to a particular process and take the CPU sample on it while the spike was going on? Have you checked with the Weblogic admins to see if any changes were made recently before the spike started?
If no changes were made regarding the muxers by your admins, I would agree with Harald that contacting Oracle would be a good next step.
Also, I saw some comments about the muxers taking more resources when traffic is low, which I thought was odd. Have you compared prior days CPU activity at the same time frame to make sure it does not happen frequently during off hours?
Do you know if the number of muxers was increased recently? There seem to be more "weblogic.socket.Muxer" threads running than I would expect based on my google searches for this name and the class "weblogic.socket.DevPollSocketMuxer" (which is the main hot spot I see).
I also wonder if perhaps there was a burst of client calls that resulted in many open sockets. This muxer class is all about listening on sockets. There are only 3 threads allocated to these by default based on what I saw here:
Another link of interest:
Thanks very much Dave,
But this CPU Load and slowness of an application began on 23rd/04/2017 at 03:30 am in the morning. This means that there was less traffic on the application! Please advice me on this.
Please clarify/ assure me on this increase of these muxers was the main cause that resulted into this problem of "slowness and CPU Load of an application". Yet, i have liked this investigation for me, this is what am getting back to the application owner.
"When you took the CPU sample, was the spike still in process or had it ended by then?"
Yes,the spike was still going on
"Did you verify that the traffic was indeed low at the time of the spike?"
I had a meeting with the app owners and verified that.
"Did you narrow down the CPU spike to a particular process and take the CPU sample on it while the spike was going on?"
Yes,i did that
"Have you checked with the Weblogic admins to see if any changes were made recently before the spike started?"
They confirmed me that there was no change conducted by then.
"Also, I saw some comments about the muxers taking more resources when
traffic is low, which I thought was odd. Have you compared prior days
CPU activity at the same time frame to make sure it does not happen
frequently during off hours?"
Well, this problem has been happening and its not the first time mostly the "slowness of the application". Please share with me that link with those comments.
According to the attached snapshot, this weblogic server seem to be using "Java muxers" due to this link and this http://stackoverflow.com/questions/1623692/what-is..."RMI client", what if they change to "Native muxer" wont they lose remote connection to other JVM's methods?
This is the quote:
It comes from the link in your comment above (the 2nd link I mentioned above:
Switching to native muxers may help if you don't have RMI clients, but I would check with Oracle next to confirm if the muxers are really even the issue.
had a meeting with the app owners and verified that.
Why not just go back and bring up the process health dashboard again and take a look at the passing transactions chart?
Then go back in time to a similar time period when there was not a spike and compare the transaction counts?
I am running 7.0 at the moment, and it looks like they locked this backdoor trick in this version, but it may work in 6.5 still.
You set DEBUG mode, then open a cockpit dashboard, then go back to Monitoring and the "Funnel Icon" should appear to let you set another time period. Sometimes the new time period works fine, other times it can be quirky.
If this does not work for you, you can chart the Backend Count" measure for the agent in question in a normal line or bar chart and set any time period you like.
About this below,
"You set DEBUG mode, then open a cockpit dashboard, then go back to
Monitoring and the "Funnel Icon" should appear to let you set another
time period. Sometimes the new time period works fine, other times it
can be quirky.
If this does not work for you, you can chart the
Backend Count" measure for the agent in question in a normal line or bar
chart and set any time period you like."
Ctrl-Shift-F9 to enter DEBUG mode (look at bottom right of UI, to the left of the server name, for the word "DEBUG" to appear).
Then open some dashboard from the cockpit, like Agents Overview, or PurePaths.
Then click Monitoring on the top right menu and see if there is now a funnel icon (see the 9th icon on a cockpit based dashboard for an example).
If there, click the funnel to set the desired time frame. There are many options. Last 30 days should work to start. Use custom to get really specific.
If that doesnt work, do Ctrl-N or Dashboard/New from the menu, click use for analysis, double click chart, select add series, type "backend" (without quotes), select "Count Backend", optionally select a specific application, click Add.
If you note the blue line does not spike in conjunction with the shaded host CPU spike. This means the CPU is not coming from a monitored process, but some other process. In a case like this, I would login to the Host in question and see what processes are taking the most CPU. Do you know why there was a sudden drop in transactions, followed by a sudden spike? I think the spike in transaction volume likely caused the CPU spike...
You could try implementing the Top Process Monitor plugin to help you with pinpointing the source of the issue.
Thanks for your support.
I tried to go through the link you provided but i have realised that it(plugin) supports only Linux and windows boxes. The destination boxes we have here are Solaris. Please advise on this.