Solved: ALR and controlling server resources

AntonioSousa · ‎26 Apr 2020

When ALR is activated in Managed, CSC logs that occurrence. There is no indication though of what is triggering ALR. I imagine that several factors (eg. CPU, Memory, Disk performace, Disk space, Environment settings, ...) might be responsible, and it would be very important to know why it is occurring.

In the case I'm dealing with at the moment, everything seems OK (CPU: 20%, Memory~70%, IOPs relatively low) but ALR was happening. Support tweaked some settings, and CPU is up to 30%. Now Disk space seems to be the issue, but have no clue.

Please give an indication of what factor(s) are triggering ALR.

Antonio Sousa

Radoslaw_Szulgo · ‎29 Apr 2020

We are aware of that. Our Dynatrace ONE team will help you to troubleshoot this and optimize your environment.

ALR kicks in based on the server capacity that was derived from load test and key metrics we take from production systems. It basically reflects the amount of “typical” load a cluster can handle on some hardware with a “typical” configuration. As soon as the server capacity is exceeded ALR will be applied.

Seems your load is rather atypical and based on the memory and CPU your cluster could handle more. That's why we have optimized ALR algorithm - it will be released soon with version 192.

Subscribe for our product news newsletter to not miss my blog post about new ALR. Cheers!

Senior Product Manager,
Dynatrace Managed expert

AntonioSousa · ‎30 Apr 2020

Looking forward to it. From what I see from Managed releases, we can expect it in 2 weeks time, right?

Antonio Sousa

Radoslaw_Szulgo · ‎30 Apr 2020

That is right!

Senior Product Manager,
Dynatrace Managed expert

AntonioSousa · ‎17 May 2020

Just read you blog post at https://www.dynatrace.com/news/blog/process-more-with-less-using-smarter-cluster-overload-prevention...

Seems that it was a very wise move. From the data I'm tracking at one of our customers, it seems that the server is doing clearly more work, and that means more data is being gathered & processed. In my opinion and based on what you've said, ALR based on GC metrics is a better way of controlling server resources, as both CPU & memory are factored in. We are seeing still not a very high usage of resources, but it might be just a matter of tweaking it a little bit more...

Thanks!!!

Antonio Sousa

Julius_Loman · ‎29 Apr 2020

Right now the ALR limits are determined by the CPU and Memory of the Dynatrace Server node.
CPU (frequency and number of cores) determines the maximum number of calls server node will accept.
Memory determines maximum number of PurePath data in bytes.

You can see your limits in the Server.log after server node startup and also you can also see why the ALR activated in the log as well:

2019-11-04 01:11:47 UTC INFO    [<default,0x1>] [ServerSideSamplingThresholdCalculator] Adaptive Load Reduction: processor clock:  2593MHz, #processor cores: 8 ==> max. service calls: 133228
2019-11-04 01:11:47 UTC INFO    [<default,0x1>] [ServerSideSamplingThresholdCalculator] Adaptive Load Reduction: system memory: 64247 Mb ==> max. subpath traffic in bytes: 86645000

2020-01-08 19:30:29 UTC INFO    [<default,0x1>] [ClusterServiceCallsSamplingSource] Activating Adaptive Load Reduction: threshold=133228, num. servicecalls=2023862020-01-08 20:29:29 UTC INFO    [<default,0x1>] [ClusterIncomingSubpathTrafficSamplingSource] Activating Adaptive Load Reduction: threshold=86645000, incoming subpath traffic in bytes=118942281

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner