08 Jun 2022 05:39 AM - last edited on 19 Oct 2022 04:35 AM by MaciejNeumann
Our client on SAAS wishes to maintain their systems at always N-1 where N is the cluster version. Can this be implemented since recent events for an update caused monitoring issues and they wished at that time to roll back to the previous cluster version
The issue was the Mark-down tile update to remove the white space after the # caussing Mark-down tiles not to work. A minor build resolved this issue after a period of days of lack of visibility of their dashboards. If it was possible on SAAS to roll back to a previous version or rather manage updates manually will be great.
SAAS 😞 seems a doubt. It will be only possible if Dyantrace product team plans to launch dedicated SAAS services with underline application instance spin up as separate application. This will take a lot of R&D.
Nice idea though!
Unfortunately, that's one of the features of SaaS that can't be changed. If managing your cluster updates is of absolute importance, you may need to speak to your Dynatrace product team to see if migrating to a managed cluster would be a good choice. Not exactly what you wanted to hear I assume!
Exactly NOT what I wanted to hear since we have deployed thousands of servers and multiple of hybrid container and Cloud environments with complex relationships. This just the bitter pill we would need to swallow considering the issue regarding Mark-Down tiles that was as a result of a SAAS cluster update causing tiles not to function hence Navigation to Dashboards was handicapped basically we lost Monitoring for Days hat caused an SLA Breach. Mitigation would be is that DT needs to rigorously improve testing of their updates before rolling out because I am sure if the update had been tested on an environment with Mark-down tiles it would have been captured and resolved before affecting Customers' Live Production Environments.
Point is I feel after that incident is there is a gap in testing of the updates before rolling out. Because DT should know that customers are monitoring Live Environments with Production Systems and missing out on visibility for days should not be acceptable
I completely understand. My customer ran into the exact same problem with the markdown tiles. We had to scramble to fix the essential dashboards that couldn't wait for a fix.
I will say that this was the first issue we had with SaaS updates that I can recall so I have hopes that something like this won't happen again but it was definitely expressed to the development team that this issue was not acceptable.
I do encourage speaking to your Dynatrace product team about other solutions. There may be something that they can do other than move to managed that might ease the update process.
I dont want to wait for "hoping for. The updates should actually be rigorously tested with their own test environment and scenarios to prevent something like that to absolutely not to happen at all. It was quite an embarassing situation for us as Dynatrace champions to our customers. So interms of positive feedback is refactor and refine where testing creates such gaps and before update is pushed to Customers. I would even prefer Monthly updates rather than weekly to have more rigorous testing. I feel maybe two wee updates creates burden to developers to push code faster hence missing out on Quality. Basically Shifting more on Left -Delivery and less on Right - Quality. I prefer less frequent updates but less or no bugs. Developers sometimes are put under pressure for 2 weekly roll-outs and compromise on quality
Sorry correction I meant Shifting more to Right(Delivery) and Less on Left (Quality). There needs to be a balance