02 Nov 2015 01:03 PM
Right now we are using 6.1.0.8105 version of Dynatrace and are getting a lot of corrupted pure paths. We are not getting them in every profile just in our heavier ones.
What is my best way to determine root cause for corrupted purepaths?
Things I know:
Solved! Go to Solution.
02 Nov 2015 01:21 PM
I'm curious why you state that you should limit yourself to 20 agent/collector. A 'normal' system should be able to handle more agents than 20/collector.
I would not focus on reducing instrumentation, until you review what the problem and find a root cause.
For example, from the Collector Sizing dashboard (Start Center -> Monitoring) is the Collector overdriven. CPU levels too high? Buffer Saturation levels high? If so, an additional collector would help, or additional cores, etc. Then dig into the dT Server dashboards and see whether it seems overdriven. Maybe it's squeezed for memory?
If you're seeing pegged CPU usage on the Collector or dT Server, then that is a good reason for messed up purepaths. Dynatrace is trying to process all the data but eventually starts getting backed up and the data keeps coming. Perhaps you're on a shared VM environment and the CPU cores are oversubscribed. However I wasn't sure what CPU you were referring to that was pegged.
As for being overinstrumented, if you've not added additional custom sensors, then I doubt you're overinstrumented. What is the depth/size of the purepaths? (Purepath dashlet)
Bottom line, it sounds like you're resource starved somewhere, not necessarily over instrumented.
02 Nov 2015 01:32 PM
Sorry it turns out it was 25 actually, and not 20. We are a .net shop and this was a number I had pulled from one of the videos from Andi's webinars.
02 Nov 2015 01:38 PM
I misread your original post, I thought you were running 6.2. But there's no magic drop-off, even with 6.1. 25 .NET agents is a guideline, YMMV. But I would still explore the things mentioned.
Of course another option is to simply put in another collector, and share the load and see if that resolves your problem. But I doubt it. Please post what you do and the results.
02 Nov 2015 01:54 PM
If you plot the corrupted purepath events, do they line up wiht the CPU spikes on the collector? If so, then I think you've found your root cause. Maybe another core is sufficient for teh Collector without having to install a whole second collector.
03 Nov 2015 06:23 AM
What is the easiest way to compare this? the heat field does not seem to show corrupted purepaths on the same dashlet and the collector monitoring even though it lets me check the boxes. The closest I have been able to come is filter the incidents to corrupted purepaths and then strech out the history column to line up with the CPU usage chart below it but that is not working that great.
03 Nov 2015 06:47 AM
Hey Jared,
You can create a dashboards containing Chart's within the Dynatrace Self-Monitoring Profile with the metrics you desire.
The Measures: CPU of the Collectors & Skipped/Early discarded/unrecorded Purepaths should show the pattern.
Cheers,
Sanj
03 Nov 2015 07:08 AM
Ah, thanks a lot, I was not looking for those on the collector side.