I've been researching MTTD (Mean Time to Detect) and MTTR (Mean Time to Repair). While a lot of pages and books talk about the concepts, very few detail usual values.
With Dynatrace, we know that MTTD is usually very low, typically at the most, only a few minutes. Without Dynatrace, we all know that it is expected to be much higher.
Regarding MTTR, given that MTTD is slashed, and that Davis pinpoints the root cause in a lot of cases, it is expected to be much less. With Keptn, this is expected to be slashed even more. On page 7 of Dynatrace's Game changing — From zero to Autonomous Cloud today, a survey conducted by Dynatrace in 2018 gives some impressive numbers.
Would like to know how the Community views this? What MTTD & MTTRs do you usually observe?
We have a very cool Blog of our MTTR: https://www.dynatrace.com/news/blog/davis-diaries-mainframe-error-to-resolution-in-minutes/
Ultimately, Dynatrace with it's root cause gives us the issue on a silver platter, and we take that information and run with it. Even when a root cause is not detected, Dynatrace is at the forefront of leading teams in the right direction to resolve the problem.
Id say days honestly. We had issues that staff was working for weeks. I deployed Dynatrace just prior to lunch, enjoyed my lunch while davis kept a watchful eye and collected the information. Came back and found ahh, the issue why the application keeps crashing is that its out of memory. To which the staffer working on the issue said "No way, I increased it the other day" They checked it and the setting was reverted back to its previous value.
We also had a use case where an associate made an unapproved Firewall change. Dynatrace saw the failures as a result of the change and alerted us to it. Staff all huddled around my screen looking into the problem. We called the head of ecommerce to run a test to see if her 3rd party request would go out and come back with a value..... and it did. So everyone one was like "Dynatrace is wrong". so we worked with networking team and they found the firewall change based off of the metrics and ports dynatrace was reporting. They implemented a change to correct the unapproved change and we started to see data pass without error. So Devops came into play, why did Ecommerce get a good reply when the requests were failing? "Well we don't give the user a correct value back, we give them a 'Default' value so they don't know it failed."
Ultimately, without dynatrace we would have been blind, and while we could have cut that MTTR down from 1 hour to 30mins if teams had trusted dynatrace and the data it was reporting.