cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Root cause of high server ACK RTT?

nicolas_vaillie
Dynatrace Pro
Dynatrace Pro

Hi all,

What could be the root cause of a high server ACK RTT?

We are at the load balancer level, with 4.3 seconds on
Server ACK time out of 4.37 seconds for the total end-to-end ACK RTT.

Also see 677 TCP errors, mainly Client not responding and
Server session termination errors.

This is over an hour-long period earlier today.

The web servers shows a really high response time but no
Server ACK time or TCP errors.

Thanks,

Nic

8 REPLIES 8

nicolas_vaillie
Dynatrace Pro
Dynatrace Pro

The web sever is IIS and a restart didnt solve the issue right away. It resolved itself after the restart and an hour. It sounds like requests were blocked and timeout, releasing resources... Any other ideas?

Adding application monitoring agents on the servers is being studied.

gabriel_casella
Dynatrace Pro
Dynatrace Pro

Just to make my thoughts clear, you are analyzing packets from the load balancer (server in this point of view) to the web server (client in this point of view), right?

Also, from the 677 what is the most common error? Do you also see TCP Window Resize (specially resize to 0), Retransmissions or TCP Resets?

My two cents from what you said, is that the client (webserver) (or even the SO) was too heavy loaded to answer to TCP requests and dropped the connection at some point

Packets are captured before the Load Balancer, the LB defined in the SWS configuration.

No TCP 0-size event, Most TCP errors are Client not responding.

The team thinks the IIS server was not properly ending responses from the re-captcha script, not ending connection and then overloading the web servers...

But I still dont understand why we are seeing those errors before the LB only, and not also at the web server level...

Thanks

Nic

Question:

  • "the LB defined in the SWS configuration". What is SWS? Is it a typo to AWS?

Also, to update what myself and Nicolas discussed, here is a picture to better explain what he means:

The re-captcha is a external google service.

My thoughts are that the IIS is waiting for something and hangs up (high server time from LB to IIS) until something happens (timeout/answer). While that, the LB is also waiting to answer the Browser, and may close the connection if a timeout is reached.

Do you have the (TTA) packet trace from this?

Hi Gabriel,

Fairly sure "SWS" is an abbreviation for Software Service.

Yes, this is the abbreviation, well done Anton 🙂 And I could eventually find a packet trace but it would not leave my account's internal network.

I have enabled my ADS details on the LB software service, I'm ready if it happens again.

Did it happen again or you have fixed/understood it?