cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

SLA availability reporting based on OneAgent data

kalle_lahtinen
Advisor

Hi,

Has anyone used OneAgent data for SLA (availability) reporting? I know synthetic monitoring is one way to do it, but since in my particular case the SLA time window is only during business hours, there's little benefit to be gained from additional 24/7 monitoring which requires extra licenses and setup work - reporting on the real user activity and transactions would be enough.

Would you recommend using RUM data or service data? To me the service point of view seems more accurate somehow. The only challenge is that I can't really report on availability (like 99,97 %) with that data, I think. I can use the failure rate to e.g. report the number of failed key requests, but I'll still have to manually calculate the availability percentage.

I did test using USQL also, to e.g. report on AVG(useraction.httpRequestsWithErrors), but it seems that the metric is quite easily picking up some 4xx errors which I wouldn't necessarily want to regard as something that's affecting the SLA. Also, it seems like I can't do a calculation like "100 - Error Rate", because that's only allowed for dateTime fields.

Am I missing anything obvious here? Does anyone have experience on setting up this sort of monthly availability reporting?

4 REPLIES 4

Domenico_Bressi
Advisor

Hi @Kalle L.

in term of "Business SLA" you can use several data from Dynatrace.


"Level 1"
  1. synthetic monitoring for availability (but this means only your site is up and running not customer are working)
  2. Apdex (Application Performance Index): To track Full application Quality during the time (javascript errors are part of calculation)
"Level 2"
  1. User Experience score (using several parameter) to track "User experience" (Bounce rate, range click, apdex etc)
  2. Apdex for "Key user actions" .. most critical business transaction (Login, Payment, Booking etc
"Level 3"
  1. Visually complete (to track site performance) - Custom Chart
  2. Error 5xx / Error 4xx to identify "worst case where site did not work as expected vs total transaction and/or error by session (Custom Chart)
  3. Number of Javascipt error during user session (Custom Chart)

Enjoy 🙂

M..

Thanks for the response. In this case we're specifically looking for SLA reporting based on the application's availability, not so much regarding performance / user experience aspects. For this need, it would be enough to select a few key requests based on which we will report the availability of the app. Maybe to simplify how this relates to synthetic monitoring:

1. Synthetic monitor tests https://my.url.com/main and reports availability %

2. Service request data based on OneAgent reports all calls to https://my.url.com/main and reports error %

So basically it's not a big difference, the latter includes all "real" requests in addition to any synthetic tests, and then reports the error rate instead of availability percentage. I'm just not sure if it's possible report that availability with the dashboards and custom graphs we have available today.


Commenting on the way you split the metrics into different levels 1-3, I see that this "Error 5xx / Error 4xx to identify worst case where site did not work as expected" is something that's closest to what I'm trying to achieve here, and it's also something I did mention I've been trying to pull via USQL, with "useraction.httpRequestsWithErrors". This data is of course available without any synthetic monitoring, based on the RUM data. My challenge now is how to modify that metric so that the true app availability is depicted by it. That would probably entail editing the HTTP error rules per app, to make it suitable for my KPI (e.g. to not have my SLA break due to 4xx errors). I'm more familiar with editing the service layer's error detection rules, exception classification etc., but I suppose the basic idea is the same for RUM data.

kalle_lahtinen
Advisor

Hi Domenico,

Thanks for the response. In this case we're specifically looking for SLA reporting based on the application's availability, not so much regarding performance / user experience aspects. For this need, it would be enough to select a few key requests based on which we will report the availability of the app. Maybe to simplify how this relates to synthetic monitoring:

1. Synthetic monitor tests https://my.url.com/main and reports availability %

2. Service request data based on OneAgent reports all calls to https://my.url.com/main and reports error %

So basically it's not a big difference, the latter includes all "real" requests in addition to any synthetic tests, and then reports the error rate instead of availability percentage. I'm just not sure if it's possible report that availability with the dashboards and custom graphs we have available today.


Commenting on the way you split the metrics into different levels 1-3, I see that this "Error 5xx / Error 4xx to identify worst case where site did not work as expected" is something that's closest to what I'm trying to achieve here, and it's also something I did mention I've been trying to pull via USQL, with "useraction.httpRequestsWithErrors". This data is of course available without any synthetic monitoring, based on the RUM data. My challenge now is how to modify that metric so that the true app availability is depicted by it. That would probably entail editing the HTTP error rules per app, to make it suitable for my KPI (e.g. to not have my SLA break due to 4xx errors). I'm more familiar with editing the service layer's error detection rules, exception classification etc., but I suppose the basic idea is the same for RUM data.

Domenico_Bressi
Advisor

An other option can be you built too a dashboard with Business Funnel and Kpi using Usql in order to identify possible error blocking your business process 🙂

https://www.dynatrace.com/news/blog/understand-and-optimize-user-journeys-with-funnel-charting/

M.