What's the best way to troubleshoot high Lost Packets rate in the AMD?


I'd like to ask you the best way to troubleshoot and diagnose a high rate of Lost Packets in the AMD.

Lost Packets = "lost on the network before they reached the AMD. AMD is able to notice lost packets by sequence numbers in the packets that reached it."

I'm not necessarily sure, but I can say from my experience that:

"Server Loss Rate" rising means the server has to continue to resend packets to the client. This may be because the client disconnected or lost the internet (especially if a mobile device is the client)

"Client Loss Rate" rising means the client has to continue to resend packets to the server. This is probably because the server is unreachable. Maybe a switch went down before the packet arrived, maybe the destination server's NIC is dropping packets, maybe that one server went down and a failover hasn't happened yet to another one in a cluster.

My question was concerning more how to address a lost packets issue on the AMD from a hardware perspective. So what are the suggested steps in order to isolate the problem.

Up to version 12.3.x you can use the "Verify Quality of Monitored Traffic" (Devices and Connections menu) on RUM Console. You can either use automatic or manual recording to diagnose which servers are suffering the issues by clicking the "blue links" on the "Sessions" tab.

Starting in 12.4 most of this functionality has moved to the CAS under "Traffic Diagnostics"

What I'd like to know is, once you have such data as those shown in the Sniffing Point Diagnostics or Traffic Diagnostics in 12.4 how to troubleshoot a high rate of Lost Packets. So I would like to know the best way to perform the next step to solve the traffic issue.




If you have lost packets happening before reaching the AMD then you need to check the traffic mirroring on your network devices. By having the IPs affected from the Diagnostics report you can isolate the network segments that should contain the traffic.

I suggest you check the capacity of the mirroring devices and the network interfaces involved. If there is unidirectional traffic you might be missing some network segments or directions (TX,RX) to get the whole conversations the AMD needs.

Dynatrace Pro
The first question when you have lost rate is to be sure that your traffic is clean. If it is not cleant that you have chosen the right deduplication akgorithm.

Once you have sure about this. You have several network indicators to measure the netwrok problems:

- Realize bandwidth: this will be used to measure if you are receiving the bandwith that you expect. low values here indicate that some one is taking the expected bandwidht

- Zero windows event: this will indicate(from the server to client communication) that a device or network is not able to take the needed speed, becuase is two high and their buffer are starting to fiil. this means that the client (or a network device before arriinvg to the client) is not proceing fast enough the quantity of packets that are arriving

- RTT and ACK RTT: This is the main Metric for the network latency. A value too high will indicate that your are passing through two many devices before reaching the client/ customers, or it is too far.

- Loss rate: it is imain metric for the network fiability. what will cause this:

- A netwrok card that is no working properly (bad device configuration or hardware problems)

- A bad network configuration/architecture

- A bad load balancing that can cause to send several times the same packets

- A network device that is not working properly or has a bad configuration.

- A bad virtual network configuration

- A bad virutal - physical network link not well configured

How to go do next?:

- Is it client loss rate?, see the communications device from the AMD capture points till the server is. See if all the server that are in a virtual infrestructure has all the same problem, or server that are all behind the same network device. If there is server behind a load balancer, see if the load balancing is very well done (seeing the operations distributions).....

- is server? are from a unique or always the same client sites? see the site infrestructure or device that the have all this sites in common. are coming from the same users, see the network device configurations.

I hope it helps.

