Solved: "Host or monitoring unavailable due to connectivity issues or server outage" alerts

larry_roberts · ‎10 Jul 2019

We have a team in Dynatrace which regularly upgrades the Kubernetes version on their production and staging clusters. The process for this upgrade is to spin up new hosts with the new Kubernetes version and to destroy the old hosts with the old Kubernetes version. They did this last night and Dynatrace raised 2 problems due to the hosts being destroyed.

"Host or monitoring unavailable due to connectivity issues or server outage"

Of course, Dynatrace isn't wrong, the hosts were shutdown intentionally as part of the upgrade process. Another wrinkle is that sometimes this upgrade process is automatically triggered by Google, so we may not always be able to plan ahead.

Is there a recommended way to handle this circumstance?

Thank you!

skrystosik · ‎10 Jul 2019

If you plan it best way would be maintenance window I think. This is how we handle the same issue. Problem is with upgrade made by google. Question is if it is possible to have some trigger that may send curl to Dynatrace and start maintenance window?

Sebastian

Regards, Sebastian

larry_roberts · ‎10 Jul 2019

Yeah we had the same thought, but as I and you pointed out it would still be an issue due to Google. I am trying to gather more information on how exactly a host is "destroyed" per say in that environment. I know that Dynatrace should recognize the difference between a clean shutdown or a dirty one with it alerting on the dirty one. It seems like there should be some way to tell the difference in this situation as well.

larry_roberts · ‎10 Jul 2019

We did find this today while looking into this problem. For those of you on GCP, this is one way around it - GCP Maintenance Window BETA

https://cloud.google.com/kubernetes-engine/docs/how-to/maintenance-window

It would still be a 4 hour period of blackout, but at least it's a start.