04 Feb 2022 03:23 PM
When pods are started, you can perform the following:
kubectl exec -it {pod} -n {namespace} -- ls /opt
And you will see the dynatrace directory. This is needed for nginx monitoring as a shared object in this directory is loaded into nginx via the main-snippet config.
However, when a node is started, all daemonsets start at the same time and there is no ordering or precedence by the scheduler. Therefore, the oneagent is not always up before other pods. After a node starts up, if you delete the daemonset pod, the new pod created will have the /opt/dynatrace directory in it and available for nginx (which is running as a daemonset too) . Unfortunately, we've tested with initContainers to delay the start of the nginx container unsuccessfully. This is because the POD is aleady started, its not the starting of containers in the pod or their restart that will pull in the /opt/dynatrace. It must be the POD. The pod does not have a restart policy on failure. Searching the Kubernetes KEP there is mention of adding ability to order daemonsets with taints and controllers, or having the ability to have Pods restarted on container failure. There is also mention of adding preStarts. My next steps will be to taint the nodes via kubelet until critical DS (dynatrace) is started before others can start.
09 Jan 2023 08:49 PM
Very interesting read.
09 Jan 2023 08:56 PM
Afaik this depends on the deployment option and it happens with the classicFullStack and it's resolved in the cloudNativeFullStack which, unfortunately, has other limitations at the moment.