Server-side load reduction – Safeguard the cluster
Imagine again a situation where a new environment gets monitored by an existing Dynatrace cluster where that new environment is sending an enormous amount of load. In such a scenario the existing Dynatrace cluster might not have the necessary hardware to process all the incoming load.
In such a situation there are two options; add more hardware or reduce the incoming traffic. If neither option is chosen the Dynatrace cluster will engage server-side load reduction to safe guard the overall health of the cluster.
Starting with Cluster version 138 the server-side adaptive load reduction will be active on all clusters. The Dynatrace managed cluster will constantly look at the resource situation by measuring the traffic that reaches the cluster on a per environment basis.
If the resource situation hits a critical measure the cluster will look at the active tenants. PurePaths coming in from Tenants with the most traffic in relation to their assigned host units (i.e. traffic/host unit) will targeted for load reduction. In a Dynatrace managed cluster this will be fully transparent to the customer, as we will raise an event and display it in the cluster UI.
In either case the reduction in processed data is accounted for transparently. The fact that not all data is being processed will have no negative impact on the customers monitoring. The AI is not being impacted at all nor is alerting. All Service based chart data will be transparently adjusted (no change will be visible) and all analysis views also account for this. Unless you are looking at a single PurePath you will not see a difference chart or service call analysis data. One place where this will be visible is in the PurePath list, as you will see a message say “x more like this”.
Only those environments that have a high volume of traffic compared to their assigned host units will be targeted. All other environments remain unaffected.
The customer now has a choice, add more hardware or reduce the traffic at the source.
Thanks for your answer. Recently, I started having it too at my Dynatrace managed installation. The problem is, I've got many monitored environments and have no idea how to identify top contributor to this problem. I'm not quite sure if I want increasing nodes capacity until I clearly understand what for. Maybe I just want to reduce monitoring capabilities for particular environment instead. Any suggestion?