Container platforms
Questions about Kubernetes, OpenShift, Docker, and more.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Rollout restart Oneagent pods for worker nodes to appear on the Deployment status page

Theodore_x86
Advisor

Hello Community team!

For the last 2 months I have performed a lot of Dynatrace deployments on k8s environments (Full kubernetes observability). In all of the cases after successful dynakube deployment and after all Oneagent pods, csi driveer pods, activegate pod are Up and Running, worker nodes do not become visible on the UI, until I restart the oneagent deamonset twice!

Has anyone else the same experience? Why is this happening beats me. While worker nodes do not appear on the Deployment page no significant errors appear on the pods

Without the Nodes on the deployment page, we do not have injection status for the processes/pods.

Any comment on the above would be much appreciated.

Thank you!

Houston, we have a problem.
3 REPLIES 3

Julius_Loman
DynaMight Legend
DynaMight Legend

What is the the OneAgent logs? I'd say you encountered the situation when your OneAgent has to connect through the ActiveGate deployed by the operator, but the ActiveGate has not yet been started yet. 

Dynatrace also introduced a feature - Operator generates a selfsigned certificate for the ActiveGate and pushes it to OneAgents.  If OneAgent has the custom.pem supplied, it validates the certificates, so probably the OneAgent cannot connect to any AG outside of the cluster and the internal AG (operator managed) is not yet known.

Certified Dynatrace Master | Alanata a.s., Slovakia, Dynatrace Master Partner

Theodore_x86
Advisor

Hello. I need to return to this issue as I faced one more time on a new AKS cluster.

Without oneagent pods restart, Worker Nodes do not appear on the Deployment Status page. Yesterday, after restart 4 out of 6 Nodes appeared. I needed to request another restart for the other 2 to appear.

@Julius_Loman let my reply on your comments (not quite promptly, sorry for that 🙂 )

Activegate pod is always UP. I do not see how the new feature with self-signed certificates could affect communication. After all, when we restart the oneagent pods the communication is established.

Nothing meaningful on the logs is discovered yet. Everything seems to run smoothly except we do not see the Nodes without a rollout restart.

BR

 

Houston, we have a problem.

Still race condition may appear. Can you share what is in the logs of the oneagent which did not connect? But check the ruxitagent_host file not the console output of the OneAgents (which basically provides watchdog log). That means, execute shell in the oneagent pod:

kubectl exec -n dynatrace -ti dynakube-oneagent-r2tvr -- /bin/sh

 and check the log files starting with ruxitagent_host in /mnt/volume_storage_mount/var_log/os/

What can happen is that the OneAgents start earlier than the ActiveGate successfully connects and registers itself in the cluster as the communication endpoint. If the OneAgents do have the custom.pem file provided by the operator, they will check the remote side. If the ActiveGate is not yet connected to the cluster, OneAgents try to connect to the rest of the known communication endpoints and fail due to the certificate. If you restart the OneAgent pods, they may already have the ActiveGate in the communication endpoint list on startup (Operator updates the list). The list of communication endpoints does not get updated for the running OneAgent pod until it's connected.

Certified Dynatrace Master | Alanata a.s., Slovakia, Dynatrace Master Partner

Featured Posts