cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Please Clarify: Self-Monitoring Metrics for Adaptive Capture Control - SaaS

r_weber
DynaMight Champion
DynaMight Champion

@markus_pfleger (nice virtually meeting again after such a long time)
@christoph_hoyer 


I was looking at the new Self Monitoring Dashboards based on what can be found in the Hub and what is presented in this webinar and in the documentation.

 

There seems to be conflicting information on how the capture rate is defined and even in the video it's done in different ways at two points I think.

 

First there is the "OneAgent Capture Rate" as explained in the documentation:

r_weber_0-1644484894327.png

The oneagent capture rate seems strange, as in my case the number for oneagent.service_calls.processed is much higher that the server.service_calls.received (see below). The resulting rate is therefore really low. Whereas the serverside Capture Rate is 100%.

 

The Server Side "Capture Rate" is calculated like this:

r_weber_1-1644484990918.png

This one seems logical, server receives calls and persists them. If equal we are all good.

 

Since for SaaS there is also the server.service_calls.maximum_allowed_per_minute, does this mean for SaaS - assuming that the "processed" service calls will always be limited before to match the "persisted" so that the capture rate is always 100%?

 

What if the "received" calls exceed the "service_calls.maximum_allowed_per_minute"?

 

Would you consider this environment healthy? (I'm observing data loss and strange metric behavior on services)? I have added these metric calculations:

 

"Persisting Rate" = (dsfm:server.service_calls.persisted)/(dsfm:server.service_calls.received)*(100)
"Capture Rate" = (dsfm:server.service_calls.received)/(dsfm:oneagent.service_calls.processed)*(100)
"Percentage of Capture Limit" = (dsfm:server.service_calls.received:splitBy():avg:auto:rate(1m)/dsfm:server.service_calls.maximum_allowed_per_minute:splitBy():avg:auto)*100

 

r_weber_2-1644486160666.png

 

For SaaS I'm interpreting this as:
As the "oneagent.processed" (yellow) service calls is much higher than the "server.received" something is dropped somewhere, not considered at all, or the "oneagent.processed" metric includes calls that are not relevant?
Note that the Persiting Rate is constant 100% although the Capture Rate fluctuates depending on the load. Hence my assumption something gets dropped before the received metric is calculated (server side drop, rate limit).

And then there is the  "server.service_calls.maximum_allowed_per_minute" which I understand should be compared to the "server.service_calls.received" metric to understand if (in SaaS) you produce more service calls than your limit allows. The "Percentage of Capture Limit" metric calculation in my case would tell me that I'm 330% over my limit - which would explain the "data-loss" or heavy aggregation I'm seeing on individual services.


Can you confirm this or am I missing something?

 

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net
0 REPLIES 0