cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Assistance with observations in server health.

james_kitson
Dynatrace Leader
Dynatrace Leader

Hello,

We've been troubleshooting some issues such as missing data and would appreciate some input from others who may have had more experience with this.

Above is 12 hours of server health. One of the things that stands out to me is of course the skipped events and skipped (non-analyzed) purepaths. Also I have been investigating the high and spiking purepath lengths.

I am also interested in the spikes across measurements here that seem to occur every hour at about 40 minutes past. The spikes seem to correlate among cpu usage, MPS, memory usage, and to some extent suspension time.

I would like some input as to ways that I can drill into some of these issues, or if anythings jumps out maybe some ideas as to what is occuring.

5 REPLIES 5

michael_taylor
Dynatrace Advisor
Dynatrace Advisor

One thing I have seen support do in the past in look at charts in the dynatrace self monitoring system profile. Then they chart the bytes per agent. This gives you an indication of where the traffic spikes are coming from. Then you look at the Purepaths for those agents in and look to see what transactions are coming in at that point in time. There you will find clues into volume, potential configuration tweaks you can make to reduce some of these spikes you are seeing. This is just a start I understand, but more will be unlocked from looking at this information.

Mike

That's a good idea and I've been looking at that now. I don't know if it is related to what I was actually looking at but using data from it we found a sensor configuration that was capturing EVERY method in a pretty big package.

However I didn't see anything traffic spikes that occurred every hour so I'm kind of thinking it might be a scheduled thing on the server side. One thing we're investigating is excessive business transaction baselining so possibly that calculation is occurring on an hourly basis.

kyle_kowalski
Dynatrace Pro
Dynatrace Pro

Hi James,

Something that I am noticing is that your Server Memory keeps climbing then dropping when the CPU spike come in so it could be a memory leak issue. I can try and take a closer look at it this weekend, feel free to send me a session at kyle.kowalski@dynatrace.com if you can. I would try and do 10 minutes before and after the spike if possible.

andreas_grabner
Dynatrace Leader
Dynatrace Leader

Here is my quick analysis.

One of the reasons for the skipped events and skipped PPs is that you have PurePaths that exceed the 10k Node Limit. The agents will still keep sending events for PurePaths that the dynatrace server already closed due to that limit. That explains some of your skipped events. these PurePAths also get skipped from analysis because they are truncated and not completed.

As for the CPU Spikes. That is just Garbage Collection Time (=Suspension Time). You can easily see the correlation of CPU to Suspension Time to the Drop in Memory -> thats when the GC cleans up Memory. That is totally normal and not to worry about. You would only worry about it if that would happen more frequently and would consume much more CPU and stall your server - OR - when the memory over a longer period of time overall increases and leads to an out-of-memory situation. But I think it doesnt.

I suggest you first focus on these PurePaths that exceed the 10k limit. MAybe these are PPs of transactoins you are not interested in, e.g: long running backend jobs, .. -> or maybe you have over instrumtenation that leads to that problem. I think if you solve this issue you will see these problems go away

Andi

It is good to hear that the spikes and garbage collection are not out of the ordinary. I have indeed noticed that this skipped data does come up when there are big spikes in the PurePath size and I was able to make a chart that showed where these increases came from in certain cases. Over instrumentation is definitely something we are looking at as we are going through the process of cleaning up our Dynatrace deployment. It is also possible that some of this is also related to my previous inquiry here:

https://answers.dynatrace.com/questions/138970/bas...

Thanks!

James