There is an strange behavior seen in CAS application health dashboard. Where it shows two way network packet loss quite high 25% and when we tried to capture the trace using RUM Console and imported the trace file Transaction Trace (DNA) there are no packet re-transmission seen.
Now the situation is we can't point to network as bad channel. and Fact is there are other applications traversing same LAN they don't show network loss rate at all.
we are using AMD 12.2 SP1 and thinking that AMD or SPAN port is dropping the packets ?
If any of you has seen similar behavior or want to add any advice, you are most welcome !!
When i have seen this in the past it has either been traffic quality or deduplication problems.
Does the application affected have a loadbalancer? What can happen is that LBs strip out the IPID and by default the AMD uses this for deduplication. This is an adjustable setting that can be changed to suit different environments. I have however been advised before that this shouldn't be changed unless support request you to do so.
You could have a look in you capture to see the IPID values.
Yes, the application uses F5 Load Balancer. Could you explain more about IPID behavior and where I can find this setting to change (if required).
Infact, there are other application like Oracle eBusiness, that is not at all reporting this problem, This issue might be related to specific type of Java/.Net application ?
I have two application which are Java apps and are using same LB.
If you take your trace file into wireshark and add a column with field type custom and fied name ip.id and check that the values are maintained for the server that is showing loss rate in the CAS.
The advise is not to change this setting without contacting customer support first. This is only one possible cause and would need to be verified before changes are made.
I also want to give you some more background :
When I meant deduplication, I am referring to a situation where application has Load Balancer Virtual Big IP and Web Servers.In this scenario AMD is listening the traffic from two SPANs : SPAN1 is receiving mirrored traffic from Virtual IP (10.41.110.81) and SPAN2 is receiving the traffic from Web Server (10.41.15.104). so at both interfaces AMD is listening same traffic with some time lag I believe. We are almost sure that packet loss rate displayed on CAS report is caused by this problem of duplication and we have to avoid listening at SPAN2 i.e. from web server (10.41.15.104) because already AMD can see this traffic through LB Virtual IP. This can be (1st) solution to this issue.
And I see in same trace file IPID is not constant, so this means deduplication is happening for 10.41.15.104 but AMD is unbale to handel this and CAS is displaying packet loss, right ? Refer the screenshot IP ID - Deduplication.png You can see with lost.segment filter, trace shows large "unseen segment" lost messages.
AMD is set to default deduplication method - "TCP checksum and IP ID" and I can confirm there is no other appication showing this pcaket loss behavior. So I am not in favor to change this method to any of rest three methods, but can that be a (2nd) solution to this issue ? or it's better to remove unnecessary port spanning.
Is this application the only one that uses the F5?
If the IP ID isn't constant then your loadbalancer might be terminating the session and recreating on the other side with a new IP ID - do yo uknow if this is the case?
Have you checked if this applies? Gather Facts on Application Topology
Depending on the configuration of the F5, you cannot always define a single Software Service but must do at least 2. One in the front of the F5 and one at the back.