07 Apr 2025 11:22 AM - edited 07 Apr 2025 11:25 AM
Hi Gosia,
Thank you for asking. Availability metrics are crucial in Synthetic Monitoring, and it's important to understand how they're calculated in Grail and what differences can be expected.
The new availability metrics in Grail are calculated by dividing the number of successful executions ('up') by the total number of executions. In contrast, the classic approach measures availability based on the duration that a monitor is considered 'up' (see the documentation).
The new approach requires all executions to happen at the same rate. However, in a real environment, this is not always the case, as monitors' execution frequency may be changed, or additional executions may be triggered in on-demand mode. So, we added an interpolation mechanism to adjust the number of executions to a fixed, minute-level resolution. In the diagram below, the orange dots show actual executions, while the blue dots are added by the interpolation mechanism.
In this new approach, availability is calculated as follows:
Availability = (Number of "Up" Executions / Total Number of Executions) × 100
Both the Classic and Grail approaches provide estimates of availability, and slight differences between them are expected. This is because Synthetic Monitoring operates at discrete intervals, meaning it does not capture the exact moment when downtime begins and ends. These differences simply stem from the way availability is measured, not from a change in accuracy or reliability.
See the example below:
Real uptime and downtime are based on assumptions. Synthetic downtime is detected with the first failed execution of the monitor, and similarly, uptime is detected with the first successful execution after an outage.
In the given example, Synthetic Monitoring was executed every 5 minutes between 08:25 AM and 09:00 AM.
Time |
Real |
Detected |
Downtime started |
08:41:20 AM |
08:41:57 AM |
Uptime resumed |
08:46:40 AM |
08:46:50 AM |
Real outage calculation:
Availability = (Up time / total time) * 100% = (5 min 20 s / 35 min) * 100% = 84.76%
Classic-approach calculation:
Availability = (Up time / total time) * 100% = (4 min 53 s / 35 min) * 100% = 86.04%
Grail-approach calculation:
Availability = (Up executions / total executions) * 100% = (5 / 35) * 100% = 85.71%
07 Apr 2025 12:01 PM - edited 07 Apr 2025 12:02 PM
There is one more aspect impacting availability, a matter of including or not including the maintenance window in calculations. We're about to deliver mechanisms for excluding the period of time during which MW happened from synthetic availability calculations on Grail.
@Cezary_Tomaszew will publish more details about that soon.
07 Apr 2025 05:44 PM
@Cezary_Tomaszew,
What is the meaning of 5, highlighted below, in Grail? Is it 5 minutes, or 5 counts?
Availability = (Up executions / total executions) * 100% = (5 / 35) * 100% = 85.71%
08 Apr 2025 06:59 AM
@AntonioSousa That would be Number of "Up" Executions (count)
08 Apr 2025 09:55 AM
Hope that's not the case, as there is potential for disaster here.
I remember vividly the discussions when transitioning from Keynote/Gomez to the Ruxit/Dynatrace synthetics...
08 Apr 2025 09:59 AM
Hi @AntonioSousa
Could you please explain it further? I would love to understand better the potential for disaster you're referring to here.
Best Regards,
Jacek
08 Apr 2025 10:05 AM
Calculating availability based on count, I just cannot imagine how many scenarios are going to be wrong? What it will mean for long time-series comparison? Synthetic availability is a little science in it's own, and now we are just going to do counts?
But it's going to be easy, just put the two values alongside each other. I don't have the time to dig into it now, but as I said, this seems like a deja-vu, from a long time ago...
08 Apr 2025 07:13 AM
@Cezary_Tomaszew On a similar note, I noticed difference in the way Total duration is being calculated for Browser monitors . I see the values seems to be fairly different when I compare with what I see in the classic app vs what's in new app /grail for the browser monitors .
Could you please shed some light on this ?
Example below -
08 Apr 2025 03:43 PM
Hi @p_devulapalli
Thanks for raising that question.
We have also noticed that we're reporting inaccurate metrics as performance values within the new Synthetic app. We will fix this soon.
We're about to release the next significant update of the Synthetic app, focusing mainly on Browser Monitors. One element of this initiative will be an update of performance metrics (yes, plural). Expect them to be described in the documentation, but I will also provide an update about those in the community, likely even earlier.
For now, let me suggest using values reported in the classic app as a source of truth for Browser monitors. Also, the mechanism that compares performance metrics vs. defined thresholds to decide whether to raise performance problems uses that value.
Further updates soon
Best Regards,
Jacek
08 Apr 2025 10:53 PM
Thanks for the update @Jacek_Janowicz