We have a customer's AMD that shows a high rate of 'TCP sessions with reordered packets'.
The AMD is on the entry point of a main datacenter, with as source SPAN ports on both of the redundant links, behind the firewall (firewall-switch). (The customer is international with a WAN. But AFAIK the DC is not directly connected to the WAN).
From a one minute automatic capture today (outside HQ office hours), it reads
TCP sessions with reordered packets: 43.06 k (43.34%)
on a manual capture of two client IP's that percentage is even much higher.
According the documentation:
TCP sessions with reordered packets
Reordered packets are typically found when
there is a WAN link enabled. Devices transferring WAN packets may affect the packet order. The
existence of reordered packets is not a problem in itself, because the
can restore original packet order, but an excessive number of such packets may cause performance
That performance degradation, in fact, I think is very present. The AMD's CPU is at already some time at an all time high of ~80%.
When zooming in (clicking) on the reordered packets figure, the division by 'responsible' server IPs can be seen.
From this overview both percentile and amount are indicators. I assume a high amount & percentage combination is what has to be looked at primarily.
Question is what can be the nature of the reordered packets, and what can we do to circumvent them?
TCP ignore software services, Packet Brokers, VLAN filtering are counter measures that spring to mind.
Is the behaviour expected in networking environments? Could it indicate something not configured optimal, in servers/network/AMD etc.?
My first guess (please correct me when I guess wrong) is that your source is a SPAN setup of some kind. Usually there is a very small amonunt of reordered packets as traffic flows normally and it "usually" happens when you pass a router. When you then start looking at a Switch and want to get to the traffic inside, things get a little different.
The first thing you can try to do is to scrutinize the SPAN. Often Switch people don't consider the status of the data they provide you: "You get a trace file - right?" as they only think about the packets getting somewhere, not what's in them (think about how the postal system works). What I see a lot of is a SPAN that is set up against a VLAN or several VLANs. Normally this will give you the packets twice (or up to 4 times and in varying order) as you get both the ingress and the egress and it's a toss up on which copy of the packet you get first. So normally I try to avoid using a VLAN as source and if I have to do, then I use only ingress packets.
Second thing would be to not use the VLAN at all and only focus on the interface(s) even though you might need to do a more advanced and static config.
There is also a chance that you have a multilink host. Depending on what OS it runs and how the multilink is setup, this might be the source of the problem. Sometimes people use an onboard NIC plus an extra NIC (worst case even another brand) and they process packets with different speed. For the normal operation of the server, this isn't a problem, but for network ananlytics tools, it might be if the host does a round robin load balancing of the interfaces.
Last but not least, the absolutely best way to capture packets or traffic to feed the AMD is to use a network TAP. It will take away so many of these small annoying things and remove the extra hurdle of explaining to the Switch admins why you need a change of their SPAN.
Hi Ulf, TAPs are also my preferred solution. But it's a challenge to convince the powers that be, at customers' side.
Ulf, you mention to use only ingress packets at certain occasions.
Does't only use ingress mean you will loose the details of the operations, reported by CAS?
This issue still exists, but time constraint to follow up here.
I do have have some talks with the responsible 3r party, and got more usefull about the architecture in use. Indeed as you guessed it is a SPAN setup.
Two Nexus (HA) switches, with two shared port channels (eg. A1+B1, A2+B2) go to a firewall, and then back into the switch. Then further into the datacenter.
Since both port-channels are SPANned to our AMD, we actually should receive the data in duplicate. AMD is reporting a low # of duplicate (<10%) but ~50% re-ordered. I wonder if in fact the AMD is confused, and reports the dupe traffic as re-ordered.
A request packet can be INGRESS to a VLAN once and EGRESS once (and then it can pass through several VLANs), the same goes for the response. It's a bit different if the SPAN is set against a HOST PORT as this is a more physical setup. If you do a UPLINK PORT, you can potentially run into the same problems as with the generic VLAN capture as a packet can go in and out in the same uplink several times, just with different headers and you would (ideally) only like to capture the packets once and in the right order.
The HOST PORT SPAN is the closest you can get to a TAP if you only use software. The only other thing I can think of is a capture on the actual host(OS level), which is not desirable.
Happy hunting 🙂
Here's 2 good links you can ponder upon:
Ulf is definitely an expert and everything he suggested might be the case. Also the last one about using taps to collect traffic for analysis is very important. However, sometimes it's practically impossible due to cost and/or the necessity to "inject" the tap to the network. To overcome some limitations of SPAN, consider using VACL Capture ports. You will find a lot of material on using VACLs on the Internet.
Good Point there @Zbyszek C. I kind of given up on VACL. 😛
So for the last 10 years I've only managed to find 1 single account where they could actually use VACL as most haevn't got a clue and are not inclidend to test, even though it's superior and much prefered as the copy happens on a chip level instead of in the OS level.
But @Frans S. - please let us know how it goes?
Agreed completely, a lot of my time is spent guiding a customers network operators in properly configuring SPANs (and most of the time they're very hostile about that). Setting up SPANs is much more difficult than the online resources would have you believe, sure the commands are simple enough, but the planning and design work to do so successfully is the difficult part.