How to avoid the multiple times of application downtime?

shubham1413 · ‎27 Aug 2023

Hi Team,

As we observed at the time of deployment,
application pods are running before oneagent pod runs, so to deep monitor again
The application pod restat requires , so the application team is not ready to do it.

So any recommendations to avoid the multiple times of application downtime?

Mizső · ‎27 Aug 2023

Hi @shubham1413 We had / have the same problem. See the support answer maybe helpful for you.

"We also see some log entries, right after the OneAgent was started, about failing process agent (PA) injection:

2023-06-20 16:24:14.048 UTC [00002133] info    [native] Detected PA Injection failure for PGI:10057839509552605518 (0x8b949fbec6c5a94e), name:/opt/dpdr-loader/bin/dpdr-loader.war dpdr-loader-* dpdr-loader, process PID: 15168, process executable: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/jre/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:15318189370463547518 (0xd4952412bca4f07e), name:SpringBoot bbsh-core-* bbsh-core-* bbsh-core-* bbsh-core, process PID: 15447, process executable: /usr/lib/jvm/java-11-openjdk-11.0.18.0.10-2.el8_7.x86_64/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:2511866119596929532 (0x22dbf0ed65e9f1fc), name:/opt/direct-sales-legacy-proxy/direct-sales-legacy-proxy.jar direct-sales-legacy-proxy-* direct-sales-legacy-proxy, process PID: 15445, process executable: /usr/lib/jvm/java-11-openjdk-11.0.10.0.9-4.el8_3.x86_64/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:3309060887739733547 (0x2dec255861dd962b), name:SpringBoot thdb-* thdb-* thdb-* thdb, process PID: 15931, process executable: /usr/lib/jvm/java-11-openjdk-11.0.16.1.1-1.el8_6.x86_64/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:5118426938130966318 (0x47084e2e8016272e), name:SpringBoot cegkozlony-* cegkozlony-* cegkozlony-* cegkozlony, process PID: 15448, process executable: /usr/lib/jvm/java-11-openjdk-11.0.16.1.1-1.el8_6.x86_64/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:5532858209396585958 (0x4cc8a93a23aff5e6), name:/opt/szsb/app/szsb.jar szsb-* szsb, process PID: 15436, process executable: /usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.el8_6.x86_64/bin/java, failure reason:Container injection failed
2023-06-20 16:24:14.049 UTC [00002133] info    [native] Detected PA Injection failure for PGI:8108368414830483863 (0x7086b61d01f7d997), name:/opt/opten-gateway/bin/opten-gateway.war opgw-* opgw, process PID: 15455, process executable: /usr/lib/jvm/java-11-openjdk-11.0.12.0.7-0.el8_4.x86_64/bin/java, failure reason:Container injection failed

therefore, we are thinking that this could be some race condition where the required binaries were not ready for injection to happen, as all processes with higher PIDs (like the other that is monitored correctly - 18600) seems to be monitored correctly. This situation is described here -
https://www.dynatrace.com/support/help/setup-and-configuration/setup-on-container-platforms/kubernet...
Could you please schedule a deployment scale event or pod restart to confirm our idea? The injection should happen automatically and the container/process should be monitored correctly afterward.

Have you also considered using CloudNative deployment type of the operator?
https://www.dynatrace.com/support/help/shortlink/dto-deploy-options-k8s#cloud-native
It offers similar functionality as the classic full-stack injection, but uses mutating webhooks to inject code modules into application pods."

Best regards,

Mizső

Dynatrace Community RockStar 2024, Certified Dynatrace Professional

shubham1413 · ‎27 Aug 2023

@Mizső

As per your suggestion, it is possible to do deep monitoring after the application pod restart.

But my question is, during deployment, all the pods are newly created (application as well as OneAgnet pods). In this situation, application pods
are getting ready before the OneAgnet pods run, so binaries were not ready for injection to happen in application pods and there was no deep monitoring of application processes.

can we deploy application pods after OneAgnt Pods are ready state?So we can't require the application pods to be restarted after deployment is complete.