cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to determine root cause for corrupted purepaths?

jared_kemp
Organizer

Right now we are using 6.1.0.8105 version of Dynatrace and are getting a lot of corrupted pure paths. We are not getting them in every profile just in our heavier ones.

What is my best way to determine root cause for corrupted purepaths?

Things I know:


  1. We have a single collector and agent overview shows 26 agents. The collector sizing dashboard shows 36 agents however. I know for our version it is around 20 recommended agents per collector, can going over this cause corrupted purepaths.
  2. We have an onsite and offsite data center and our collector is at the offsite. However, all but our lowest usage app are at the external datacenter in the same room as the collector without a firewall between them.
  3. I have done my best to minimize instrumentation in each profile though it is possible some things are extra. There is not a lot of knowledge on the development side as to what is/isn't used by applications here. Is there a good way for me to analyze by sensor type and know what I can disable?
  4. The collector CPU usage shows our average is only about 25% but there are some portions where the CPU appears pegged(max value goes above 100%). Memory for the collector is averaging around 50-60% maxing out around 75. Buffer saturation is sitting at a max of 1.75%.
8 REPLIES 8

Joseph_Hoffman
Champion

I'm curious why you state that you should limit yourself to 20 agent/collector. A 'normal' system should be able to handle more agents than 20/collector.

I would not focus on reducing instrumentation, until you review what the problem and find a root cause.

For example, from the Collector Sizing dashboard (Start Center -> Monitoring) is the Collector overdriven. CPU levels too high? Buffer Saturation levels high? If so, an additional collector would help, or additional cores, etc. Then dig into the dT Server dashboards and see whether it seems overdriven. Maybe it's squeezed for memory?

If you're seeing pegged CPU usage on the Collector or dT Server, then that is a good reason for messed up purepaths. Dynatrace is trying to process all the data but eventually starts getting backed up and the data keeps coming. Perhaps you're on a shared VM environment and the CPU cores are oversubscribed. However I wasn't sure what CPU you were referring to that was pegged.

As for being overinstrumented, if you've not added additional custom sensors, then I doubt you're overinstrumented. What is the depth/size of the purepaths? (Purepath dashlet)

Bottom line, it sounds like you're resource starved somewhere, not necessarily over instrumented.

Sorry it turns out it was 25 actually, and not 20. We are a .net shop and this was a number I had pulled from one of the videos from Andi's webinars.

I misread your original post, I thought you were running 6.2. But there's no magic drop-off, even with 6.1. 25 .NET agents is a guideline, YMMV. But I would still explore the things mentioned.

Of course another option is to simply put in another collector, and share the load and see if that resolves your problem. But I doubt it. Please post what you do and the results.

The CPU usage I am talking about is from the Start Center>Deployment Health>Dynatrace Collector Sizing dashboard. It is a virtual server on vmware.

Joseph_Hoffman
Champion

If you plot the corrupted purepath events, do they line up wiht the CPU spikes on the collector? If so, then I think you've found your root cause. Maybe another core is sufficient for teh Collector without having to install a whole second collector.

What is the easiest way to compare this? the heat field does not seem to show corrupted purepaths on the same dashlet and the collector monitoring even though it lets me check the boxes. The closest I have been able to come is filter the incidents to corrupted purepaths and then strech out the history column to line up with the CPU usage chart below it but that is not working that great.

Hey Jared,

You can create a dashboards containing Chart's within the Dynatrace Self-Monitoring Profile with the metrics you desire.

The Measures: CPU of the Collectors & Skipped/Early discarded/unrecorded Purepaths should show the pattern.

Cheers,

Sanj

Ah, thanks a lot, I was not looking for those on the collector side.