Right now we are using 220.127.116.1105 version of Dynatrace and are getting a lot of corrupted pure paths. We are not getting them in every profile just in our heavier ones.
What is my best way to determine root cause for corrupted purepaths?
Things I know:
Solved! Go to Solution.
I'm curious why you state that you should limit yourself to 20 agent/collector. A 'normal' system should be able to handle more agents than 20/collector.
I would not focus on reducing instrumentation, until you review what the problem and find a root cause.
For example, from the Collector Sizing dashboard (Start Center -> Monitoring) is the Collector overdriven. CPU levels too high? Buffer Saturation levels high? If so, an additional collector would help, or additional cores, etc. Then dig into the dT Server dashboards and see whether it seems overdriven. Maybe it's squeezed for memory?
If you're seeing pegged CPU usage on the Collector or dT Server, then that is a good reason for messed up purepaths. Dynatrace is trying to process all the data but eventually starts getting backed up and the data keeps coming. Perhaps you're on a shared VM environment and the CPU cores are oversubscribed. However I wasn't sure what CPU you were referring to that was pegged.
As for being overinstrumented, if you've not added additional custom sensors, then I doubt you're overinstrumented. What is the depth/size of the purepaths? (Purepath dashlet)
Bottom line, it sounds like you're resource starved somewhere, not necessarily over instrumented.
I misread your original post, I thought you were running 6.2. But there's no magic drop-off, even with 6.1. 25 .NET agents is a guideline, YMMV. But I would still explore the things mentioned.
Of course another option is to simply put in another collector, and share the load and see if that resolves your problem. But I doubt it. Please post what you do and the results.
If you plot the corrupted purepath events, do they line up wiht the CPU spikes on the collector? If so, then I think you've found your root cause. Maybe another core is sufficient for teh Collector without having to install a whole second collector.
What is the easiest way to compare this? the heat field does not seem to show corrupted purepaths on the same dashlet and the collector monitoring even though it lets me check the boxes. The closest I have been able to come is filter the incidents to corrupted purepaths and then strech out the history column to line up with the CPU usage chart below it but that is not working that great.
You can create a dashboards containing Chart's within the Dynatrace Self-Monitoring Profile with the metrics you desire.
The Measures: CPU of the Collectors & Skipped/Early discarded/unrecorded Purepaths should show the pattern.