Recently I ran into an issue at a customer where data-quality dropped. I figured out that this had to do with adaptive capture control implemented for Dynatrace SaaS.
I had some fruitful conversations with Dynatrace Product Managers on this topic. And I thought it's important to share this with the community as well.
It was confirmed that SaaS adaptive capture control can limit the processing of service requests to a state where the data gets really aggregated a lot. This is not necessarily a bad thing but it really depends on your architecture, and it can have these impacts:
If your architecture (or what you are monitoring of it with Dynatrace) has lots of services and a high amount of service calls (e.g a few 100k/min) you might feel the impact.
Similar to what is described in this blog post for Dynatrace Managed I recommend to also implement some health monitoring for Dynatrace SaaS, specifically for the adaptive capture control.
Monitor Service Call Limit and Capture Rate
Some of the metrics described in the above blog post are also available and relevant in your SaaS tenant.
I'm using this dashboard to get an idea of the status of capture control in my SaaS tenant (dashboard is attached to this post):
What you can see in the above screenshots are the attempts to get back high fidelity data by improving the capture rate again.
What are we seeing there:
Out of these metrics we can calculate the capture rate and how much of the limit we are consuming.
As you can see in the chart we have put some effort into increasing the capture rate, mainly by increasing the limit for this environment. We are not there at 100% but getting close now (we moved from 15% capture rate to 62% but will likely increase the limit a bit more).
But this is not the only strategy that you should follow! What else should you do?
Keep an eye on service call numbers!
It is very easy to just drop oneagents everywhere, enable istio/envoy tracing, define custom services and so on. But this can lead to lots of service calls, and sometimes you start tracing requests that you actually do not need or that do not provide any benefits. So generally you might want to consider these recommendations before blindly increasing the service call limit (which still might be required):
When applying these recommendations and you are still over the capture limit, you might have an architecture that is not well suited for the current implementation of capture control and might want to talk to your Dynatrace representative to make sure you can still get the best out of your data!(This only implies for SaaS - in Managed you can control these limits yourself).
It is understandable that every SaaS solution has some "guardrails" to ensure an healthy operation. Such limits are necessary to avoid service quality drops due to "overloading". So it is a question of wise balancing.
While addressing this situation for one of my customers together with Dynatrace we found room for improvement and new features, but also the useful information for any Dynatrace user to keep an eye on their situation.
I hope this helps!