on
25 Nov 2024
12:37 PM
- edited on
26 Nov 2024
07:21 AM
by
MaciejNeumann
Sometimes, due to the delay from the K8s pod to setup network, the OneAgent code-modules injected into the monitored process time out when trying to setup the initial connection:
2024-10-02 07:40:48.031 UTC [00000007] info [comm ] Initial connect: not successful within 6s - giving up 2024-10-02 07:40:48.031 UTC [00000007] info [comm ] Initial connect: connection to initial gateways failed (last error Failed to connect to dynakube-activegate.dynatrace port 443 after 0 ms: Couldn't connect to server) using any of
2024-10-02 07:40:48.031 UTC [00000007] info [comm ] ....
2024-10-02 07:40:48.031 UTC [00000007] info [comm ] .....
2024-10-02 07:40:48.031 UTC [00000007] warning [native] Unable to do initial setup because no server is reachable. Last error: Failed to connect to dynakube-activegate.dynatrace port 443 after 0 ms: Couldn't connect to server
However, when the K8 pod finishes the network setup, the OneAgent connected to the DT cluster:
2024-10-02 07:41:03.603 UTC [0000000e] info [comm ] Connected to https://dynakube-activegate.dynatrace:443/communication
Overall, monitoring is working, and the data is reaching Dynatrace, but some features are not enabled.
As a result, the OneAgent code-modules can monitor the metric of the technology of the monitored process (example: jvm metric), however, due to the initial connection timeout, the OneAgent doesn't get the config/ sensors from the cluster to instrument the process, then it can't detect any server and report the Purepath.
You can fix this problem by increasing the OneAgent initial connection timeout.
Increase the OneAgent initial connection timeout through the pod system env:
env:
- name: DT_INITIAL_CONNECT_RETRY_MS
value: "30000"
and restart the pod to reload the config.
Add the following feature flag to DynaKube:
kubectl annotate dynakube <name-of-your-DynaKube> feature.dynatrace.com/oneagent-initial-connect-retry-ms=6000 -n dynatrace
See OneAgent unable to connect when using Istio
Since increasing the initial connection timeout waits on the monitored process to startup, it will delay the monitored process to load the application.
"Since increasing the initial connection timeout waits on the monitored process to startup, it will delay the monitored process to load the application."
Do you mean the JVM injection for deep dive which would in turn delay the startup of the JVM that will be monitored by the oneagent deep dive or the oneagent monitoring itself? We have this issue on our ActiveGates when installing the oneagent for self-monitoring (infrastructure only monitoring) as well as most of our application servers we are monitoring.
Excellent tip, thanks.
We had exactly this scenario: up to a 9-second delay and features like Log Trace/Span enrichment and Java Memory Profiling. For example, in an application with 5 pods, 2 worked and 3 didn't. After restart, 1 worked and 4 didn't, and so on. Sometimes 3 worked, randomly.
Ex. from our logs:
2025-08-21 03:05:10.792 UTC [00000007] info [comm] Initial connect: Connection to initial gateways failed (last error SSL connection timeout) using any of:
XXXX, XXXX, XXXX ...
2025-08-21 03:05:10.793 UTC [00000007] warning [native] Unable to do initial setup because no server is reachable. Last error: SSL connection timeout
2025-08-21 03:05:19.584 UTC [0000000e] info [comm ] Connected to XXXX...
This solve the issue. Great!
You can also work around the issue in Istio (1.7+) with the holdApplicationUntilProxyStarts setting.
Global (Mesh Level):
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
defaultConfig:
holdApplicationUntilProxyStarts: true
By POD:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
metadata:
annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
spec:
containers:
- name: my-app-container
image: my-app-image
As last comment,
I find this issue normally happens if / where Istio is enabled. it can easily be addressed by adding the following annotation:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
Istio / Sidecar Injection Problems
Also please note that adding this Dynatrace annotation:
oneagent-initial-connect-retry-ms
can have significant impacts on some applications (like hazlecast / spring / tomcat) as it can delay start-up of application components and trigger applications to not load components correctly due to timeouts - causing additional connectivity issues at the application layer that it was trying to resolve.