Automations
All questions related to Workflow Automation, AutomationEngine, and EdgeConnect, as well as integrations with various tools.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Mapping SRE concepts to SLOs and Site Reliability Guardian

RayPB
Visitor

Hi all,

We're fresh starters of applying SRE to our way of working and are in the process of defining SLIs and SLOs per critical user journey of the system we need to monitor for its reliability. We have defined some SLOs with their SLI in DQL and came across the Site Reliability Guardian (SRG). The SRG seems to be able to act as an umbrella for multiple SLOs, but it is unclear to us whether a single SRG would map to a single critical user journey, that has multiple SLOs, or that a single SRG acts as an umbrella for all primary SLOs of all critical user journeys of a single system? According to what we understand from SRE is that a single critical user journey can only have one primary SLO on which your primary Error Budget is based and on which you determine your burn rates with associated alerting. We tend to lean towards the second option (one SRG, all primary SLOs of the CUJ's of that system, but would like to understand what Dynatrace recommends and whether there are any pitfalls for doing it this way ( or the other)?

In addition, are there any plans to allow the SRG to reference existing SLOs instead of having to recreate them within the SRG?

0 REPLIES 0

Featured Posts