Is there any documentation available that describes the pros and cons of the Build Time vs Run time deployment options for Application-Only OpenShift container monitoring?
I recall having come across some older information that indicated that one drawback with the Build time approach is that it would be necessary to keep track of agent versioning and perform rebuilds when new versions came out (to maintain currency), whereass with the Run time approach currency is automatic. If I understand correctly, however, there is now a OneAgent Operator capability that allows for the automation of the agent update and roll-out of new versions. If I understand what is said in the documentation correctly, therefore, it would appear that the one "con" of the Build Time approach is removed via use of this operator. Is that correct?
More generally, what are the current pros/cons of the two approaches?
We have been performing integrations using the Run time deployment, but one of our OpenShift operations team members has expressed concern that there is a "risk of container failure at initialisation when the DT_API_URL is unavailable to respond to oneAgent download requests." So, for example, if the DT server was down or not resposibve enought, or if there were network issues and the agent could not be downloaded the containers would fail to initialize... according to the OpenShift team this would result in "either a restart or a shutdown of the application system." I am not sure whether there is anyway to avoid this issue with the Run time approach (?). With the Build Time approach (and with use of the Dynatrace Operator) I am not sure whether there would be any similar operational impacts to the containers in the event that the agent could not communicate to the cluster (?).
Solved! Go to Solution.
In our case one specific drawback we had with build-time integration is that it made quick disabling of the agent clumsier than with runtime deployment. This is due to the fact the with the official build time integration method the container entrypoint is overwritten with the Dynatrace-provided script under /opt/dynatrace/oneagent/dynatrace-agent64.sh which sets all required environment variables including LD_PRELOAD in the process context only. This means you can't simply remove LD_PRELOAD from the deploymentconfig and start a new deployment to disable the agent... I'm sure there are ways around that but this is something that we hadn't thought about initially and caused unnecessary delay in removing the agent from a productive system that was impacted by it...
With the runtime integration we can simply remove the LD_PRELOAD from the DC and redeploy the pod.
PS: Due to the above drawback we are mostly not using the official suggested way for build-time integration based on the dynatrace-agent64.sh but instead simply install the OneAgent into the base image using the default path (opt/dynatrace/oneagent) and pushing the oneagent-enabled image to an output image stream and enable it by setting LD_PRELOAD in the deploymentconfig accordingly.
We run OneAgent-enabled builds for all standard base images after each cluster update tagging the output image streams accordingly.
Thank you for your response. The one point I'm not clear on is why you would need to disable the agent in the Runtime deployment. I.e., since there is no need to reach out to the DT server to obtain the agent at container startup, what situations would result in a need to disable it (other than discovering some sort of application issues caused by the agent)?
Hi there. I joined Dynatrace a few months ago, and this is my very first post in the community forum. Very exciting. First I will compare run time with build time. Then talk about the operator. Finally end with a little tid bit.
Run time integration
Build time integration
You are right that the OneAgent Operator takes this burden away from you. It also does this in a more elegant way by installing the agent on the Kubernetes node, rather than in the container. This is one of the major advantages of Dynatrace Kubernetes monitoring - in that you can correlate node events / problems with application issues. By nature, this is a full-stack instrumentation, so if you are limited to application-only monitoring, you must stick with the docker container integration we discussed above 🙂
It is possible to change the runtime integration script so it exits gracefully in case of a network problem. This would have a similar result to a rolling update strategy, in that your app would stay running and healthy, but it has a major downside - after the deployment you would not have any monitoring since the agents never installed. I am also unclear if we would support this model, but I am happy to look into the complexities there if you're interested.
My first post! Yay! Hope this answers your question 🙂
Thank you very much for your detailed response. I found it very helpful. There is one point that I am not totally clear on though. It seems clear from your response that the OneAgent Operator requires the agent to be installed on the cluster node... this would result in Full Stack monitoring would it not? We only want to deploy in Application Only mode (largely due to licensing issues related to cluster node memory size). So I am not totally clear whether we can use the OneAgent Operator to achieve Application Only monitoring of specific containers.
That's right. If you are constrained by licensing issues, the OneAgent operator is out of reach. I would go with the rolling update strategy if I were you. Best of both worlds. Small container image. Latest agent. And you're safe if there's an outage 🙂