We have a team in Dynatrace which regularly upgrades the Kubernetes version on their production and staging clusters. The process for this upgrade is to spin up new hosts with the new Kubernetes version and to destroy the old hosts with the old Kubernetes version. They did this last night and Dynatrace raised 2 problems due to the hosts being destroyed.
"Host or monitoring unavailable due to connectivity issues or server outage"
Of course, Dynatrace isn't wrong, the hosts were shutdown intentionally as part of the upgrade process. Another wrinkle is that sometimes this upgrade process is automatically triggered by Google, so we may not always be able to plan ahead.
Is there a recommended way to handle this circumstance?
Solved! Go to Solution.
Yeah we had the same thought, but as I and you pointed out it would still be an issue due to Google. I am trying to gather more information on how exactly a host is "destroyed" per say in that environment. I know that Dynatrace should recognize the difference between a clean shutdown or a dirty one with it alerting on the dirty one. It seems like there should be some way to tell the difference in this situation as well.
It would still be a 4 hour period of blackout, but at least it's a start.