09 Jun 2026 07:20 PM
Hi,
I'm observing a reproducible behavior in an OpenShift cluster monitored with Dynatrace Cloud Native Full Stack and would like to understand whether this is expected or if others have seen something similar.
OpenShift 4.19
Dynatrace Operator 1.8.1
Cloud Native Full Stack enabled
Dynatrace CSI Driver running
Dynatrace Webhook running
ActiveGate running
After a full cluster shutdown/startup, the cluster comes back healthy and workloads start successfully.
However, multiple processes appear in Dynatrace as:
Restart required
Failed to enable
Examples include:
Spring Boot application workloads
kube-apiserver
kubelet
openshift-apiserver
etcd-related processes
For our Spring Boot application we verified the following:
Immediately after cluster startup:
Application pod is running
Service is reachable
Process appears in Dynatrace
Deep Monitoring is not fully active
Dynatrace reports Restart required
After performing a rollout restart of the deployment:
/opt/dynatrace/oneagent-paas is mounted
OneAgent libraries are loaded
Deep Monitoring becomes enabled
Services and process details appear correctly
This is not a one-time occurrence.
We can reproduce it after every full cluster reboot:
Shut down the entire cluster.
Start the cluster again.
Workloads start successfully.
Dynatrace reports multiple processes as Restart required.
Manual pod restart fixes application workloads.
Some processes occasionally disappear from Host → Processes view while remaining visible and active in Process Group view. Opening the host through the process relationship sometimes makes the process visible again.
Has anyone seen similar behavior with Cloud Native Full Stack after a complete OpenShift cluster restart?
Is it expected that workloads may start before the Dynatrace CSI driver/webhook are fully ready, requiring a restart to receive Deep Monitoring?
Are there any recommended practices to ensure workloads are instrumented automatically after cluster recovery without requiring manual rollout restarts?
Thanks!
Regards, Deni
09 Jun 2026 10:00 PM
This is not a standard situation and should not happen. Dynatrace uses priorityClass to have its components started first.
I'd recommend either opening a support case or checking your Dynatrace component logs and diagnosing the pod events for any Dynatrace startup issues. What can happen is that the download of Dynatrace images takes too much time, the pods do not wait for it and are started without Dynatrace. But this is just my assumption, and it needs to be diagnosed in your environment.
09 Jun 2026 11:01 PM
A bit more context from my side:
This is my own lab Bare Metal OpenShift cluster which I use to learn and test Dynatrace features.
The applications are demo workloads and the traffic is synthetic/test traffic.
The entire environment was built from scratch by me, including the OpenShift setup, storage configuration, networking, Dynatrace deployment, applications development and deployment, supporting services ... . Because of that, it is entirely possible that I have introduced a configuration issue somewhere rather than encountering an actual Dynatrace product problem.
My goal is not only to make the monitoring work, but also to better understand:
how Cloud Native Full Stack injection works,
the startup dependencies between Operator, Webhook, CSI Driver and workloads,
where to look when instrumentation does not happen as expected,
which logs and components are most useful during troubleshooting.
Given the behavior I'm seeing after a full cluster reboot, could you suggest what evidence you would collect first?
For example:
Which Dynatrace component logs would you inspect first?
Are there specific webhook, CSI or Operator messages that indicate failed or missed injection?
Is there a way to verify whether a pod started before Dynatrace injection became available?
Are there any OpenShift events or Dynatrace diagnostics that would help prove or disprove a startup ordering issue?
I'm mainly trying to learn the correct troubleshooting approach and understand what "good" versus "bad" startup behavior should look like in a Cloud Native Full Stack environment.
Thanks!
Regards, Deni
Featured Posts