We're observing long term inconsistent / occasional instrumentation gaps on AWS ECS and EC2 boot-up or auto scale. In our case, we use Ubuntu with systemd which allows faster boot via parallel init executions.
This week we tested AMI's with 1.23, 1.25, and 1.29, and reverted to 1.25 as it seemed the most robust.
Our specifics are using Docker, and we observe that (typically) Nginx and some jvm's are up and fully running prior the oneagent initialization.
Therefore we see random processes uninstrumented.
We've verified a simple restart of the process does get properly "hooked" but manual intervention is not an option for our process and certainly not production.
Has anybody experienced this and found a reliable workaround?
Solved! Go to Solution.
I am somewhat confused, but it seems you are just asking if there is anyway to get around a process needing to be restarted to be monitored, correct? Unfortunately, there isn't a way around this - if the process is already started and running when the agent is installed it will not be monitored until a restart. This is because the agent requires injection into the running process to then discover what is inside.
P.S. agent version 129 has all of the same features as 125, as well as some added features. These can always be viewed on the blog. For example, here is the blog post for v129 of the OneAgent: https://www.dynatrace.com/blog/tag/oneagent-v1-129...
Thanks Hayden, but No -
Our issue is that (arbitrarily) OneAgent doesn't instrument processes at boot time.
We have a (3'rd party) team managing our AWS infrastructure, they don't have access to verify all agents are fully functional after a reboot / maintenence, etc. so I am looking for a robust solution to prevent manual intervention / manual restarts.
It seems to me like a service startup dependency problem. Without going into further detail, the service that sets up the injection (oneagentproc) may be started after your application services. In version 131 we have removed the oneagentproc service on x86 architecture, and thus the problem shouldn't appear anymore. Please give version 131 a try once it is available for you and let me know if the issue was solved. In case further assistance is needed feel free to open a support ticket on https://www.dynatrace.com/support/