cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to find out what causes the high CPU Load for an application in AppMon?

christopher_teb
Organizer

Hi all,

One of the monitored applications by AppMon has gotten a high CPU Load.

The application owner is very keen to find out the cause by AppMon. Yet AppMon shows with status "OK" and high spike.snapshot.jpg

Please help how to find out(drill down) to the exact problem of such kind of load.

Br,

Chris

41 REPLIES 41

dave_mauney
Dynatrace Champion
Dynatrace Champion

Hi Chris,

Since the blue line (Instrumented Processes) spikes along with the User CPU and System CPU, it means one or more monitored processes have CPU spikes, so you need to look on the right at the monitored processes and see which ones have spikes. You can then drill into them in more detail. Look at the load for the process, did it spike? If so, is the CPU just symptomatic of more load. If not, use the Reponse Time Hotspots and Method Hotspots to look for methods with high CPU time.

You can always drill into the PurePaths and add the CPU time columns as well and sort by them.

Note that if the CPU comes from methods in threads we are not capturing PurePaths for, you will not see many clues in these dashlets. In that case, you may need to resort to CPU sampling and/or Thread Dumps to see what is consuming the CPU.

HTH,

dave

BabarQayyum
Leader

Hello Chris,

Take the 'CPU' sampling and also the 'Thread Dump' of that process to know which 'Method' is taking too much resources of CPU.

Regards,

Babar

Thanks to you all,

Please kindly help with the snapshots for more clarity and understanding. The app owner is with me here.

Br,

Chris

Hello Chris,

There are different ways to reach on the following screenshot level but you can go from the monitoring dashboard overview.

Click on the 'Processes' then right click on your desired process and 'Open'.

Please also keep in mind before this which @Dave M. mentioned in his answer.

Regards,

Babar

christopher_teb
Organizer

Thanks Barba,

After CPU sampling, below results are gotten. You can advise me.

snapshot.jpg

Hello Chris,

There is a class 'object' and the method 'wait(long)' maximum taking CPU time under the Java.lang package.

Drill down to this method for more specific information.

Regards,

Babar

Hi Chris,

There is an unusual high amount of CPU burnt in String.toLowerCase() (and other XMLBeans methods). Group by thread and drilldown where the calls are coming from.

Object.wait() usually does not make any problems, as waiting threads are not scheduled.

Best

Harry

Anyone help me to understand what the RED and GREEN color along with methods denotes in the snapshot.jpg

BR

Soorya Mohan

The green box by CPU means it is currently "healthy".

The red sections in the green bar above the chart mean it was "unhealthy" for the duration of time with red and "healthy" for the time with green.

HTH,

dave

Thanks Dave for the reply, i noticed blue color also on the method, what that denotes..

BR

Soorya Mohan

These icons are the Eclipse IDE style icons marked for Public (green), protected (yellow) and Private(red) type of classes and methods. Blue M with green circle and tick means that methos is instrumented.

Hi,

This icons denote the visibility of a method! This has nothing to do with CPU health.

http://stackoverflow.com/questions/3957321/what-do-the-icons-for-methods-in-eclipse-mean

Best

Harry

Thanks Harald! It really helpful !!

christopher_teb
Organizer

Hi All,

A CPU Load has been realised again and have made a CPU sampling. Below are the results have gotten. how can i drill down to the exact problem

snapshotjpg.png

Hi Chris,

Please provide the full CPU sample, or open a support case. It is impossible to analyze it via a screenshot.

best

Harry

christopher_teb
Organizer

Ok Harald but how to extract it?

Hi Chris,

Just right click on the CPU sampling session in the session browser and use the "Export..." menu item.

Best

Harry

christopher_teb
Organizer

Hi Harry,

Find the attachment for CPU sampling

cpu-samples.zip

Hi Chris,

Hard to tell. It seems to be a Weblogic cluster communication problem. Also classloading is slow (there is some CPU burnt on file access). I suggest to contact Oracle

Best

Harry

Hi Chris,

When you took the CPU sample, was the spike still in process or had it ended by then? Did you verify that the traffic was indeed low at the time of the spike? Did you narrow down the CPU spike to a particular process and take the CPU sample on it while the spike was going on? Have you checked with the Weblogic admins to see if any changes were made recently before the spike started?

If no changes were made regarding the muxers by your admins, I would agree with Harald that contacting Oracle would be a good next step.

Also, I saw some comments about the muxers taking more resources when traffic is low, which I thought was odd. Have you compared prior days CPU activity at the same time frame to make sure it does not happen frequently during off hours?

HTH,

dave

dave_mauney
Dynatrace Champion
Dynatrace Champion

Hi Chris,

Do you know if the number of muxers was increased recently? There seem to be more "weblogic.socket.Muxer" threads running than I would expect based on my google searches for this name and the class "weblogic.socket.DevPollSocketMuxer" (which is the main hot spot I see).

I also wonder if perhaps there was a burst of client calls that resulted in many open sockets. This muxer class is all about listening on sockets. There are only 3 threads allocated to these by default based on what I saw here:

http://jojovedder.blogspot.com/2009/05/weblogic-so...

Another link of interest:

http://stackoverflow.com/questions/1623692/what-is...

HTH,

dave

Thanks very much Dave,

But this CPU Load and slowness of an application began on 23rd/04/2017 at 03:30 am in the morning. This means that there was less traffic on the application! Please advice me on this.

Please clarify/ assure me on this increase of these muxers was the main cause that resulted into this problem of "slowness and CPU Load of an application". Yet, i have liked this investigation for me, this is what am getting back to the application owner.

Br,

Chris

christopher_teb
Organizer

Hi Dave,

"When you took the CPU sample, was the spike still in process or had it ended by then?"

Yes,the spike was still going on

"Did you verify that the traffic was indeed low at the time of the spike?"

I had a meeting with the app owners and verified that.

"Did you narrow down the CPU spike to a particular process and take the CPU sample on it while the spike was going on?"

Yes,i did that

"Have you checked with the Weblogic admins to see if any changes were made recently before the spike started?"

They confirmed me that there was no change conducted by then.

"Also, I saw some comments about the muxers taking more resources when
traffic is low, which I thought was odd. Have you compared prior days
CPU activity at the same time frame to make sure it does not happen
frequently during off hours?"

Well, this problem has been happening and its not the first time mostly the "slowness of the application". Please share with me that link with those comments.

According to the attached snapshot, this weblogic server seem to be using "Java muxers" due to this link and this http://stackoverflow.com/questions/1623692/what-is..."RMI client", what if they change to "Native muxer" wont they lose remote connection to other JVM's methods?

snapshot.jpg

Br,

Chris

This is the quote:


  • Blocks on reads until there is data to be read from a socket. This behavior does not scale well when there are a large number of sockets and/or when data arrives infrequently at sockets. This is typically not an issue for clients, but it can create a huge bottleneck for a server.

It comes from the link in your comment above (the 2nd link I mentioned above:

http://stackoverflow.com/questions/1623692/what-is-weblogic-socket-muxer

Switching to native muxers may help if you don't have RMI clients, but I would check with Oracle next to confirm if the muxers are really even the issue.

Thanks,

dave

dave_mauney
Dynatrace Champion
Dynatrace Champion

BTW: I
had a meeting with the app owners and verified that.

Why not just go back and bring up the process health dashboard again and take a look at the passing transactions chart?

Then go back in time to a similar time period when there was not a spike and compare the transaction counts?

HTH,

dave

Hi Dave,

The problem i have is that Dynatrace on the home/monitoring dashboard limits to only 72hrs i cannot extend back more from that. Refer to the attached snapshot snapshot.jpg. Is there any way i can extend back more??

I am running 7.0 at the moment, and it looks like they locked this backdoor trick in this version, but it may work in 6.5 still.

You set DEBUG mode, then open a cockpit dashboard, then go back to Monitoring and the "Funnel Icon" should appear to let you set another time period. Sometimes the new time period works fine, other times it can be quirky.

If this does not work for you, you can chart the Backend Count" measure for the agent in question in a normal line or bar chart and set any time period you like.

HTH,

dave

christopher_teb
Organizer

Hi Dave,

Please help me the snapshots

Hi Chris,

Not sure what you mean...what help can I provlde?

dave

christopher_teb
Organizer

Hi Dave,

About this below,

"You set DEBUG mode, then open a cockpit dashboard, then go back to
Monitoring and the "Funnel Icon" should appear to let you set another
time period. Sometimes the new time period works fine, other times it
can be quirky.

If this does not work for you, you can chart the
Backend Count" measure for the agent in question in a normal line or bar
chart and set any time period you like
."

Ctrl-Shift-F9 to enter DEBUG mode (look at bottom right of UI, to the left of the server name, for the word "DEBUG" to appear).

Then open some dashboard from the cockpit, like Agents Overview, or PurePaths.

Then click Monitoring on the top right menu and see if there is now a funnel icon (see the 9th icon on a cockpit based dashboard for an example).

If there, click the funnel to set the desired time frame. There are many options. Last 30 days should work to start. Use custom to get really specific.

If that doesnt work, do Ctrl-N or Dashboard/New from the menu, click use for analysis, double click chart, select add series, type "backend" (without quotes), select "Count Backend", optionally select a specific application, click Add.

HTH,

dave

christopher_teb
Organizer

Hi Dave,

Hope you are fine, the CPU load had gone down on Saturday but what surprises is that, it has gone up at around 12:30PM today and went down again! See below,

snapshot.jpg

They have asked me which process causes this spike? See the attached CPU sampling,

tue-may-02-14-14-15-utc-2017.dts

Br,

Chris

christopher_teb
Organizer

Hi,

Is there anybody for the above request?

dave_mauney
Dynatrace Champion
Dynatrace Champion

If you note the blue line does not spike in conjunction with the shaded host CPU spike. This means the CPU is not coming from a monitored process, but some other process. In a case like this, I would login to the Host in question and see what processes are taking the most CPU. Do you know why there was a sudden drop in transactions, followed by a sudden spike? I think the spike in transaction volume likely caused the CPU spike...

christopher_teb
Organizer

Hi Dave,

Below are the updates from the application owners

snapshot.jpg

Please advise

Br,

Chris

,

Hi Dave,

Below are the updates from the application owner about the "Muxers"

snapshot.jpg

Please advise

Br,

Chris

Hi Chris,

I don't have any further suggestions.

You might want to check with Oracle if the problem persists and see what diagnostics they suggest.

HTH,

dave

david_n
Inactive

Hello Chris,

You could try implementing the Top Process Monitor plugin to help you with pinpointing the source of the issue.

https://community.dynatrace.com/community/display/DL/Top+Process+Monitoring+Plugin

Thanks,

David Nicholls

christopher_teb
Organizer

Hi David,

Thanks for your support.

I tried to go through the link you provided but i have realised that it(plugin) supports only Linux and windows boxes. The destination boxes we have here are Solaris. Please advise on this.

Br,

Chris

Hi Chris,

I couldn't find any options for monitoring the Solaris processes more in depth than what the AppMon client already provides (CPU Samples).

Thanks,

David Nicholls