<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Percentage-based Thresholds for Site Reliability Guardian in Automations</title>
    <link>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266649#M1974</link>
    <description>&lt;P&gt;When we used keptn for Quality Gates, we were able to set SLIs that were percentage-based - things like "if response times have not risen more than 5% from previous values, pass" or "if failures haven't risen more than 3% or by 100 flat, pass".&amp;nbsp; These are very similar to how we can define anomaly detection today.&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have traffic that significantly varies throughout the day and between different days of the week, so using static thresholds does not work effectively.&amp;nbsp; I cannot determine how to do this with SLOs.&amp;nbsp;I tried using the auto-adaptive feature of the SRG, but it's still using a static threshold as it learns.&amp;nbsp;&lt;/P&gt;&lt;P&gt;This has created some issues - one is that it learned an error rate of 0%, but then the SRG ran and had a single error in 10,000 requests and failed (the interface still showed 0%, I had to look at the query to see it was rounding the 0.0001%) which I would not want.&amp;nbsp; Another is that it learned Saturation based on a couple runs later in the day, then failed when run in the morning when we are busier and the application is consuming more resources.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Being able to do this as a percentage comparison (how does the application compare to what was running just before the SRG ran) is much more useful in our context.&amp;nbsp; How can I accomplish this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jan 2025 17:53:04 GMT</pubDate>
    <dc:creator>brianrutherford</dc:creator>
    <dc:date>2025-01-03T17:53:04Z</dc:date>
    <item>
      <title>Percentage-based Thresholds for Site Reliability Guardian</title>
      <link>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266649#M1974</link>
      <description>&lt;P&gt;When we used keptn for Quality Gates, we were able to set SLIs that were percentage-based - things like "if response times have not risen more than 5% from previous values, pass" or "if failures haven't risen more than 3% or by 100 flat, pass".&amp;nbsp; These are very similar to how we can define anomaly detection today.&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have traffic that significantly varies throughout the day and between different days of the week, so using static thresholds does not work effectively.&amp;nbsp; I cannot determine how to do this with SLOs.&amp;nbsp;I tried using the auto-adaptive feature of the SRG, but it's still using a static threshold as it learns.&amp;nbsp;&lt;/P&gt;&lt;P&gt;This has created some issues - one is that it learned an error rate of 0%, but then the SRG ran and had a single error in 10,000 requests and failed (the interface still showed 0%, I had to look at the query to see it was rounding the 0.0001%) which I would not want.&amp;nbsp; Another is that it learned Saturation based on a couple runs later in the day, then failed when run in the morning when we are busier and the application is consuming more resources.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Being able to do this as a percentage comparison (how does the application compare to what was running just before the SRG ran) is much more useful in our context.&amp;nbsp; How can I accomplish this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 17:53:04 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266649#M1974</guid>
      <dc:creator>brianrutherford</dc:creator>
      <dc:date>2025-01-03T17:53:04Z</dc:date>
    </item>
    <item>
      <title>Re: Percentage-based Thresholds for Site Reliability Guardian</title>
      <link>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266667#M1975</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.dynatrace.com/t5/user/viewprofilepage/user-id/38612"&gt;@brianrutherford&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would something like the following be what you're looking for?&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;timeseries avg(dt.synthetic.browser.availability), avg(dt.synthetic.browser.duration), by:{dt.entity.synthetic_test}
| filter contains(entityName(dt.entity.synthetic_test), "nameFilter")
// Fetch timeseries for validation timeframe

| lookup [timeseries avg(dt.synthetic.browser.duration), by:{dt.entity.synthetic_test}, shift:-7d
  | filter contains(entityName(dt.entity.synthetic_test), "nameFilter")], sourceField:dt.entity.synthetic_test, lookupField:dt.entity.synthetic_test
// Fetch timeseries for comparison timeframe, in this example it's -7d

| fields
  `7d duration change` = ((arrayAvg(`avg(dt.synthetic.browser.duration)`) - arrayAvg(`lookup.avg(dt.synthetic.browser.duration)`)) / arrayAvg(`avg(dt.synthetic.browser.duration)`)) * 100
// Find the percentage change in the average of the timeseries

| fieldsAdd `7d duration change` = if(isNotNull(`7d duration change`), `7d duration change`, else:0)
// If there is no data when the comparison happens it returns null, this line ensures it returns 0 instead of null

| fields `7d duration change`
// Return only the comparison field as that is all we need.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jan 2025 01:05:03 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266667#M1975</guid>
      <dc:creator>Fin_Ubels</dc:creator>
      <dc:date>2025-01-06T01:05:03Z</dc:date>
    </item>
    <item>
      <title>Re: Percentage-based Thresholds for Site Reliability Guardian</title>
      <link>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266728#M1976</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.dynatrace.com/t5/user/viewprofilepage/user-id/47376"&gt;@Fin_Ubels&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;very interesting approach. Thanks for sharing it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I`m I right that the accepted percentage (7d duration change) is then defined as static threshold?&amp;nbsp;&lt;BR /&gt;E.g.:&lt;BR /&gt;Warning if result: 3%&lt;BR /&gt;Fails if result: 5%&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 06:32:56 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/266728#M1976</guid>
      <dc:creator>JohannesBraeuer</dc:creator>
      <dc:date>2025-01-07T06:32:56Z</dc:date>
    </item>
    <item>
      <title>Re: Percentage-based Thresholds for Site Reliability Guardian</title>
      <link>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/267326#M1993</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.dynatrace.com/t5/user/viewprofilepage/user-id/33327"&gt;@JohannesBraeuer&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I understand the original post correctly then yes, you'd then define a static threshold. That static threshold would be the percentage change over time that is unacceptable. In my above DQL that would be over 7 days but it could be over any timeframe.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The downside to the above approach is that if it only get's 2% worse every time, but it does get worse every time consistently, then that performance degradation could fly under the radar and compound. So alongside the above approach I would recommend also have a static threshold on the underlying timeseries without doing a timeframe comparison so that there is a hard limit.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 03:43:09 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Automations/Percentage-based-Thresholds-for-Site-Reliability-Guardian/m-p/267326#M1993</guid>
      <dc:creator>Fin_Ubels</dc:creator>
      <dc:date>2025-01-14T03:43:09Z</dc:date>
    </item>
  </channel>
</rss>

