cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

This product reached the end of support date on March 31, 2021.

Help in Diagnosing Sequence Number Gap Rate issue

luke_boyling
Dynatrace Pro
Dynatrace Pro

At my customer, we've set up a virtual 12.4.5 High Speed AMD on ESXi with auto-discovered traffic turned on. The AMD diagnostics is reporting a consistent sequence number gap rate, with 5-minute averages ranging between 5 and 17% during business hours. The AMD is monitoring two physical links, with each Tx line being tapped separately, resulting in 4 capturing interfaces at the AMD. The reported traffic usage at peak is 50Mbps. The CPU usage is no more than 2%, and there are no dropped packets on the AMD.

I exported a 5 minute packet capture from the AMD into an old version of Wireshark. About 11% of packets have some TCP error or issue, and there's a range of servers and clients involved. Each of the 4 interface captures had a similar rate of TCP errors, suggesting that its not a single tap at fault

What are some of the things that we can do to determine the likely cause of the gaps - whether there's an issue with the tapping, issue with the ESXi setup or there's a wider network problem that we're observing?

2 REPLIES 2

ulf_thornander3
Inactive

Juicy!

Where are those links going?

I'd take a look at the other end too. Just as you , I get a hunch that something might not be all perfect in the virtual World.

Did you do a Steven graph in Wireshark?

It'll tell you if there are some sort of regular pattern in the issue.

Do the sequence gap also result in degraded performance/RTT/latency?

john_leight
Dynatrace Pro
Dynatrace Pro

To help determine if there is a tap or real issue - I'll try and get an independent capture. Can you get a capture off of one of the worst servers that DCRUM is reporting high loss rate for.

If the server capture comes back clean, then there may be a problem with un-matched duplicates getting through to the AMD - or not getting de-duplicated properly.

If the server capture is dirty, then there may be a more pervasive network problem.

When new spans/amds are setup - there always seems to be some verification of "bad things" that has to happen before we can pin it on an actual problem.