cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Connectivity Problem on several hosts after update to OneAgent version 1.303

DanielS
DynaMight Guru
DynaMight Guru

Hi Community, 

After upgrading to OA v1.303, I have several hosts and processes running on them with connectivity issues. Are there others with the same behavior?

I see that the latest release includes several updates related to the Infrastructure Observability | Hosts category. And some networking issues have been resolved:

Network module

  • Fixed calculation of the size of TCP data segment in case of TCP offloading. (OA-37005)
  • Fixed calculation of timeouts and time durations that could result in inflated retransmission metrics. (OA-35217)
  • Fixed a oneagentnetwork crash related to NIC state change from down to up. (OA-37724)

After reading the lasts updates to last version of OA deployed. I think the issues that have occurred now and not before on some hosts could be related to this update.

The true delight is in the finding out rather than in the knowing.
18 REPLIES 18

jcurbina
Guide

Hi, We have the same problem yesterday over kubernetes platforms. Big problem.

We open a ticket but no answer at this time.

Juan

jcurbina
Guide

Hi,

The affected metric is TCP connection timeout which is causing the activation of a problem that apparently does not really exist. But on Windows platforms we have detected it in the TCP connection refused metric. This has been happening since yesterday after the update to 1.303.

Regards

thanks @jcurbina I also create a ticket, I keep you posted here of any advance.

 

DanielS_0-1731610886532.png

 

The true delight is in the finding out rather than in the knowing.

Thank you very much Daniel, so far we have no feedback on our ticket. What we have done is to disable the anomaly detection for TCP connectivity since from the information we have had it is only a false positive.

We will be attentive to any news. Thank you very much. A couple of examples:

jcurbina_0-1731623207539.png

 

jcurbina_1-1731623222741.png

 

DanielS
DynaMight Guru
DynaMight Guru

Support has informed us that this is happening to several customers and is under investigation.

The true delight is in the finding out rather than in the knowing.

rgarzon1
Pro

Hi,

After seeing this, i just check 3 client and this is the 

updated to 1.303

rgarzon1_0-1731698603158.png

another client updated

rgarzon1_1-1731698633463.png

another without the update -  1.295

rgarzon1_2-1731698667274.png

but in my case in any of them a problem was arise. so we didn't notice until now. 

 

fuelled by coffee and curiosity.

Hi Ruben,

The problem has been quite strange, because it has not been massive, it occurs in some Hosts and in some clients, there are even similar hosts but it has not had the same behavior. In some cases it has been immediate after a restart, it may be that after the process restart the problem is activated.
In one Windows case the affected metric was TCP connection refused and it generated TCP connectivity type problems in IIS app pools.

@Mohamed_Hamdy I think we faced the same problem.

Certified Dynatrace Professional | Certified Dynatrace Services - Observability | Dynatrace Partner yourcompass.ca

MaciejNeumann
Community Team
Community Team

Hello everyone,

We're in the process of fixing this issue. I'll post here when the changes will go live. 

If you have any questions about the Community, you can contact me at maciej.neumann@dynatrace.com

I've seen in the Jira ticket that the fix went live with the newest OneAgent update, and the issue was resolved 😊

If you have any questions about the Community, you can contact me at maciej.neumann@dynatrace.com

Hello Maciej Neumann, could you tell us the date when the version with the correction will be released?

Hey jpinto! OneAgent version 1.303.50+ has the fix for this issue and should be rolling out now to environments. If you're still not seeing it in your environment, please reach out via in-product assistance (live chat) or make a support ticket to have our teams push that version to your tenant(s). Thanks! 

Hi Macie,

At one of our customers we have already verified that 15 updated Linux hosts have fixed the problem.

jcurbina_0-1732141542717.png

 

rgarzon1
Pro

Hi Maciej

i can see a difference since 1.303.50.20241118-133432 ..

rgarzon1_0-1732114181947.png

I notice a change after the update, thanks

 

fuelled by coffee and curiosity.

Hi Ruben,

At this point, we have not had the opportunity to update any customers or check a case. We hope to be able to check the fix as soon as possible. But this is good news. Thanks for the feedback.

matoma
Visitor

Buenos días a todos, en mi caso, a pesar de actualizar a la version 1.303.50 continuo con el problema  A partir de la actualización de la version 1.303.42 estoy con algunos equipos que la métrica de Availability paso del 100% al 83%

Linux or Windows? Our validation was done on Linux platforms only.

En este caso ambos contenedores son Linux - Alpine Linux (kernel 5.10.226-214.880.amzn2.x86_64)

Featured Posts