Re: Cloud Native Full Stack: Workloads require manual restart after full OpenShift cluster reboot to get Deep Monitoring

deni · ‎09 Jun 2026

Hi,

I'm observing a reproducible behavior in an OpenShift cluster monitored with Dynatrace Cloud Native Full Stack and would like to understand whether this is expected or if others have seen something similar.

Environment

OpenShift 4.19
Dynatrace Operator 1.8.1
Cloud Native Full Stack enabled
Dynatrace CSI Driver running
Dynatrace Webhook running
ActiveGate running

What happens

After a full cluster shutdown/startup, the cluster comes back healthy and workloads start successfully.

However, multiple processes appear in Dynatrace as:

Restart required
Failed to enable

Examples include:

Spring Boot application workloads
kube-apiserver
kubelet
openshift-apiserver
etcd-related processes

Application workload example

For our Spring Boot application we verified the following:

Immediately after cluster startup:

Application pod is running
Service is reachable
Process appears in Dynatrace
Deep Monitoring is not fully active
Dynatrace reports Restart required

After performing a rollout restart of the deployment:

/opt/dynatrace/oneagent-paas is mounted
OneAgent libraries are loaded
Deep Monitoring becomes enabled
Services and process details appear correctly

What makes this interesting

This is not a one-time occurrence.

We can reproduce it after every full cluster reboot:

Shut down the entire cluster.
Start the cluster again.
Workloads start successfully.
Dynatrace reports multiple processes as Restart required.
Manual pod restart fixes application workloads.

Additional observation

Some processes occasionally disappear from Host → Processes view while remaining visible and active in Process Group view. Opening the host through the process relationship sometimes makes the process visible again.

Question

Has anyone seen similar behavior with Cloud Native Full Stack after a complete OpenShift cluster restart?

Is it expected that workloads may start before the Dynatrace CSI driver/webhook are fully ready, requiring a restart to receive Deep Monitoring?

Are there any recommended practices to ensure workloads are instrumented automatically after cluster recovery without requiring manual rollout restarts?

Thanks!

Regards, Deni

Dynatrace Integration Engineer at CodeAttest

Julius_Loman · ‎09 Jun 2026

This is not a standard situation and should not happen. Dynatrace uses priorityClass to have its components started first.

I'd recommend either opening a support case or checking your Dynatrace component logs and diagnosing the pod events for any Dynatrace startup issues. What can happen is that the download of Dynatrace images takes too much time, the pods do not wait for it and are started without Dynatrace. But this is just my assumption, and it needs to be diagnosed in your environment.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

deni · ‎09 Jun 2026

@Julius_Loman

A bit more context from my side:

This is my own lab Bare Metal OpenShift cluster which I use to learn and test Dynatrace features.

The applications are demo workloads and the traffic is synthetic/test traffic.

The entire environment was built from scratch by me, including the OpenShift setup, storage configuration, networking, Dynatrace deployment, applications development and deployment, supporting services ... . Because of that, it is entirely possible that I have introduced a configuration issue somewhere rather than encountering an actual Dynatrace product problem.

My goal is not only to make the monitoring work, but also to better understand:

how Cloud Native Full Stack injection works,
the startup dependencies between Operator, Webhook, CSI Driver and workloads,
where to look when instrumentation does not happen as expected,
which logs and components are most useful during troubleshooting.

Given the behavior I'm seeing after a full cluster reboot, could you suggest what evidence you would collect first?

For example:

Which Dynatrace component logs would you inspect first?
Are there specific webhook, CSI or Operator messages that indicate failed or missed injection?
Is there a way to verify whether a pod started before Dynatrace injection became available?
Are there any OpenShift events or Dynatrace diagnostics that would help prove or disprove a startup ordering issue?

I'm mainly trying to learn the correct troubleshooting approach and understand what "good" versus "bad" startup behavior should look like in a Cloud Native Full Stack environment.

Thanks!

Regards, Deni

Dynatrace Integration Engineer at CodeAttest

Julius_Loman · ‎10 Jun 2026

@deni I'd recommend looking at Troubleshooting posts for Kubernetes here in the community, for example at Pod injection troubleshooting.

Nowadays, top AI models will give you very good recommendation, but - why not ask Dynatrace intelligence in the first place? Be sure to have agentic mode enabled. It will give you much better answers.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

deni · ‎10 Jun 2026

@Julius_Loman Thanks for the recommendation, will check the articles.

I did try Dynatrace Assist (including Agentic mode) before opening this discussion, but I still couldn't fully explain the behavior. I've also been cross-checking findings with ChatGPT while digging through the Kubernetes events and Dynatrace injection details.

Today after powering the OpenShift cluster back on I found that one of the affected pods showed the following events during startup:

FailedMount: driver name csi.oneagent.dynatrace.com not found in the list of registered CSI drivers

FailedMount: dynatrace-bootstrapper-config not registered

NetworkPluginNotReady: no CNI configuration file in /etc/kubernetes/cni/net.d/

The pod eventually started successfully and was injected:

oneagent.dynatrace.com/injected: true
dynakube.dynatrace.com/injected: true
LD_PRELOAD is present

However, inside the container I can see inconsistent behavior.

One workload gets:

/opt/dynatrace/oneagent-paas
└── oneagent-paas

but the directory is otherwise empty and Dynatrace reports the process as "Restart required".

What makes this interesting is that after the cluster has been running for some time, many of the processes suddenly switch to Deep Monitoring without me restarting anything. Before powering the cluster on this morning, Dynatrace was actually showing most of them as fully monitored. After startup they reverted back to "Restart required" again.

This makes me wonder whether I'm looking at some startup race condition where workloads come up while CSI registration and Dynatrace initialization are still in progress.

Does the CSI driver registration failure shown above look significant to you, or would you expect Dynatrace to recover automatically from that situation once the CSI driver becomes available?

Thanks!
Regards, Deni

Dynatrace Integration Engineer at CodeAttest

Julius_Loman · ‎10 Jun 2026

Looks like a race condition related to download of Dynatrace OneAgent images from the repository (either yours, public or cluster - depending what you have configured). Be sure to check the logs for Dynatrace pods (webhook, csi driver).

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

deni · ‎15 Jun 2026

Update

I opened a Dynatrace Support case and spent some additional time investigating the behavior after several full OpenShift cluster power cycles.

At this point I have not received a definitive technical explanation yet, but two interesting findings emerged.

Finding 1 - "Restart required" may not be the most appropriate status

According to the support response, OpenShift control-plane components such as:

kube-apiserver
etcd
kube-controller-manager
openshift-apiserver

are not primary targets for Deep Monitoring and may not fully satisfy the requirements for code-level monitoring.

What I found interesting is that support also acknowledged that in many such cases "Not applicable" would likely be a more appropriate state than "Restart required".

This leaves some ambiguity around the meaning of the displayed status for infrastructure processes.

Finding 2 - Historical process entities remain visible

The more interesting observation is related to entity lifecycle.

After a full cluster reboot:

OpenShift control-plane processes restart successfully.
The cluster becomes healthy.
Dynatrace continues showing process entities for kube-apiserver and similar components.

However, the visible process entities appear to be historical entities whose lifetime started months ago.

For example:

OpenShift reports container/process restarts.
The process entity lifetime in Dynatrace continues from the original creation date.
I cannot clearly identify a replacement process entity representing the currently running process.

At the moment I still do not understand whether this is expected entity lifecycle behavior for OpenShift static control-plane processes or whether there is some limitation in how Dynatrace represents them.

Support currently considers the overall behavior to be working as designed, but the entity lifecycle aspect remains unclear and is the part I am still trying to understand.

I'll post another update if I receive additional clarification from engineering or support.

Dynatrace Integration Engineer at CodeAttest