This is an oversimplified diagram of one application on our network. We have hundreds of applications that use this concept. The blue arrows are web traffic (80/443) from the client to the web servers. The green arrows are application traffic (various including https using a TCP port other than 443). The orange arrows are database traffic (traditional and Exadata).
Each F5 is using SNAT. There is also a portal (not shown and used for some applications) in between the client and the MDF.
1) How do I configure rules for each tier as the F5 is changing the source with SNAT?
Our capture point is a port channel on the MDF which includes the vlans
(subnets) for the web, app and db tiers forwarded to an NPB then to the
AMD. However, due to the nature of Cisco - only 2 SPANs allowed per
context - this SPAN contains L2/L3 traffic for other applications and
services. As there are numerous (over 100) vlans on this port channel,
and the application traffic can cross over several of them, there are
bound to be duplicate packets.
2) How do I handle these duplicate packets?
Dedup is implemented on the NPB. However, we are seeing very high sequence number gap rates (<30%), as well as high unidirectional traffic and retransmission rates.
3) How would I know if the "correct" duplicates are being removed?
4) How do I keep the relationship of each tier associated to the original client; one session end-to-end?
I read an article on the forum about using monitor clone pools to
forward traffic from the F5 to the AMD. This would be used instead of
the SPAN. However, the "owners" of the F5 do want to use a million
dollar piece of hardware to perform spans.
If you need any other information don't hesitate to ask.
Thanks and God bless,
Fro the diagram it looks that you have two approaches possible here: simplifies and complete end to end.
In the simplified approach, you may want to monitor a VIP in the 10.10.50 network as your front end and then servers in each 172.16 networks as respective tiers (web, app, db). When communication is HTTP/HTTPS, using XFF headers on F5s would help in finding out the real client IP (i.e. e.g. desktop IP when sniffing at the 172.16.1 web servers, web server IP when sniffing at the app servers. But it won't work at the DB servers). This simplified approach doesn't look at F5 performance except the front F5.
In the complete approach, you will have both 10.x and 172.x servers, so each side of the F5 will be monitored. Advantage is that you will know the load balancing efficiency at the app tier, but configuration gets more complicated. Software services have to be defined for both fronts and backs of each F5.
Duplicates need to be removed as much as possible on the NPB level. It shouldn't matter which is the "right one" to remove, as from the IP lever perspective e.g. the green or yellow arrows at 10.x tier carry the same data.
It's not clear what switch is at the MDF level. and what traffic levels do the SPANs have to carry. Please note that according to the picture it is at least 2x traffic visible on fronts of F5. Traffic levels above 5 Gbps may be easily reachable in such cases and at such levels the SPAN ports may be at the edge of packet drops already. If any packet are dropped there, because of SPAN limitations, that may be the reason for unclean traffic. No NPB would fix it.
So overall, it may be sensible to start with monitoring servers at the 172.x networks, if this traffic doesn't go through the same MDF (which the picture suggests.
Regarding one session end to end - if I understand it correctly, you'd like to see desktop client IP address at the app server? This would only be possible if web servers were able to forward XFF headers they received from 10.10 F5. Tracking the desktop client IP down to the database wouldn't be possible because of the aa in between. In general, this kind of transaction tracing may be better addressed with the Dynatrace OneAgent on app and web servers.
I want to quickly thank you for your comments. I need to read this over a few times to understand.
The web/app/db traffic does go through the same MDF. Just different VDC's and modules*. Bad drawing on my part.
I'm going to read up about the Dynatrace OneAgent.
*We are working with Cisco, and they believe the unidirectional traffic might because of the SPAN source is mixed over different modules (M1 and M2 cards). There is an issue with this mix if the SPANs are configure in extended mode and there are more than 2 SPANs on the entire switch. This is our case. Need to test.
Will keep you apprised of the situation.
Thanks and God bless,
Just to update. After removing mode extended from the monitor session on the Cisco MDF's.
No more duplicates, unidirectional, or drops. However, sequence number gap rate, and re-transmissions are still very high.
Sequence number gap rate
gap means that certain amount of traffic (sequence numbers), which is
part of monitored session, has not been received by the AMD. It may be
caused by overloaded SPAN.
Top retransmission-affected servers (last hour)
of servers with high two-way loss rate (retransmissions) in conjunction
with high level of packet duplicates indicates poor quality of the
monitored traffic stream.
While duplicates mainly affect the AMD performance, too high rate of
retransmissions (>10%) is a clear indicator that the monitored
traffic stream contains too many out-of-order packets that cannot be
processed and, in turn, produce irrelevant performance measurements.
We are using an Gigamon GigaVUE HC2 as our NPB.
Thanks and God bless,
Should the F5 VIP (or other IP addresses) be used as the
Main server IP address or NLB NAT masking IP address because of SNAT?
Optional: Enter the Main server IP address.
If the monitored application runs on several servers that are linked
together in a farm, you can monitor the farm as one virtual server. In
this case, type the IP address that you want to use as your main server
Optional: Enter the NLB NAT masking IP address.
This is the IP address of the server masking the addresses of
monitored servers. If the servers you intend to monitor reside behind an
appliance that masks and replaces the addresses of the target servers,
you need to set NLB NAT masking IP address to the IP address of the masking server.
Without doing so, the AMD will see two unidirectional conversations
instead of one bi-directional conversation between the servers and
Unless you account for this, CAS reports will return ambiguously granulated data. Using the NLB NAT masking IP address option will ensure that the AMD monitors contiguous conversations.
Thanks and God bless,