cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Method used for CPU sampling/profiling in dynatrace

patrick_ditmar
Newcomer

One of our developers noticed a high amount of CPU time in Unsafe.park(). This could be related to a "bug" with JVMTI based profilers, they will mark the wait time as CPU time while that is not the case.

According to the developer there are other methods (partial thread dumps or non-safepoint profiling). The following paper describes these http://ssw.jku.at/Teaching/PhDTheses/Hofer/PhD.pdf... and even has "dynatrace austria" as sponsor.

Which method of profiling is used within dynatrace? Could the Unsafe.park() time the developer is seeing right now is actually not CPU time?

2 REPLIES 2

peter_karlhuber
Dynatrace Pro
Dynatrace Pro

Hi Patrick,

jvmti offers us these two methods that we use (as far as I can tell) to get the traces and the times:

http://docs.oracle.com/javase/7/docs/platform/jvmt...

http://docs.oracle.com/javase/7/docs/platform/jvmt...

So, if there's a bug in these methods, we're probably suffering from it. Normally (speaking from memory), the park / unpark nodes would show up as "wait" time in the cpu samples. Under the premise that our reported CPU time is not a bug, could it be too many threads, too much context switching, not enough memory, something like that?

Disclaimer: I'm not an agent dev, so my statement is only valid until someone more knowledgable corrects me. Best regards,

Peter

c_schwarzbauer
Dynatrace Champion
Dynatrace Champion

hi Patrick,

Peter already mentioned the JVMTI methods that we use for Java CPU sampling.

The
behavior you mention is actually a "known behavior" which is a result
from the way we calculate timings in the AppMon Server based on the
stacktrace snapshots we get from the Agent, which is actually not as trivial as it may sound. Basically a snapshot is
considered "active" (= consuming CPU) if there's some CPU time spent on
it. however, if there's a long snapshot interval and most of it spent in
such a method as Unsafe.park() or LockSupport.park(), but some of it
spent actively, then chances are high that those methods are shown as
CPU hotspots.

I don't know all the details by heart any more right now, but
there's already some ideas how we could improve this (e.g. also by
handling specific well-known methods differently). however, priority-wise we did not spend much time on this topic recently.

also the research that you found was initiated by us and provided some interesting approaches. however, implementing it in a way that it would work across multiple JVM vendors and operating systems and in a supported and as-little-as-possible-invasive way was something that we figured would not be possible or at least not with reasonable effort so far.

so right now I would suggest:

  • increase the CPU sampling interval to the highest frequency, which should reduce this behavior
  • if you see methods like Unsafe.park() or LockSupport.park() with high CPU, just assume that it's actually wait time
  • create an RFE to raise awareness and collect votes

best, Christian