cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Network Tiers failures

cosmin_gherghel
Dynatrace Pro
Dynatrace Pro

I have an application I recently on-boarded and I am seeing some confusing data. In the Data Explorer and Applications reports I see thousands of TCP failures for the Client network and network tiers. The problem is that I cannot see what type of errors those are. I can drill down to the Metric Charts report for each tier and select the Availability tab but I do not see any failures there. Why are the failures not showing up anywhere else like the AHS report and is there a way to find out if this data is correct?

network-tiers.png

metric-charts-availability.pdf

19 REPLIES 19

john_leight
Dynatrace Pro
Dynatrace Pro

What do you get if you click on the failures number itself (the number 3.54K or 15K)?

ulf_thornander3
Inactive

What do the mouse over say?

If you have 99.1% I think the problem is neglible

When I mouse over they show as TCP failures but the metric TCP errors shows 0. The other tier has availability of 94% which would indicate a problem. Seems like this is happening for a lot of apps.

report-1.pdf

Good report - add more TCP Failures metrics to see the exact ones. You may also want to add sites to see if it's all sites or some in particular.

I am in the Application, transactions and tier data view. I cannot add more specific TCP errors or specific sites. Operation data view will not show me the network and client network tiers. If these failures are true then I should see them on the front end tier of the application but doesnt seem that way.

Well - You have some kind of problem there as it's not consistent throughout all SS/Applications.

It looks like you would need to do a trace to get understanding of WHAT it really is.

Clearly you have packet loss on the FLM app.

I will check the feed in regards to the packet loss but I see this issue with other apps that do not have high loss rate. TEL for example has 211k TCP failures and 0 TCP erros but less than 1% loss rate.

ulf_thornander3
Inactive

Your loss rate is ....not good 🙂

As John says - split it to sites to see if there is something there - What protocol is FLM using?

With that high of a loss rate, I'd consider getting a tcpdump to see the actual packet loss and review the span/data feed to ensure it's configured properly.

Cannot split by sites in the Tier view. FLM is using SOAP over HTTP and some Generic with trans. I will take a tcpdump and check the apcon for correct configuration.

If these errors are true shouldnt I see the same amount of errors on other tiers of the application like the F5 or Web tier? Why only on the network tiers?

john_leight
Dynatrace Pro
Dynatrace Pro

I understand where you are coming from now. I have the same issue as you. When using different data views the number of errors are different.

ulf_thornander3
Inactive

The first view looks outwards "client network" and the second one looks inwards "site" that's why the 8 errors are server not responding. - just my 2cents 🙂

john_leight
Dynatrace Pro
Dynatrace Pro

Here is an explaination to the differences from support/product management:

Failures (TCP) has a different meaning depending on the data
view for network tiers. On Software Services view it is sum of
("Connection refused errors" + "Connection establishment timeout
errors") and on Tier data view its meaning is basically the number of loss
packets.

Concluding, you should not compare these values for network
tiers as they meaning is different.

I think this is a little confusing when you look at the reports and tool tips. One would expect the metric (with an identical name) to mean the same thing. I'm going to open and RFE to see if this difference in data view meaining can be made a little more clear.

Krzysztof_Ziemi
Dynatrace Pro
Dynatrace Pro

Agree the terminology is a bit confusing. The key to understanding meaning of this metrics is to always filter by the Tier Type when using the Tiers data view. With filter on, either "Network tier" or "Datacenter tier", and two tables (one per each filter), it would start making sense (although we've made it a puzzle - fully agree):

  • When looking at Datacenter tier - Failures (TCP) and consequently Failures represent number of either client or server violations of the TCP session state: resets, not responding etc. Nodes communicating fail. BTW, the Software Services data view uses the same algorithm.
  • When looking at Network tier - Failures (TCP) and consequently Failures represent number of network link failures that show up as lost packets. With a single-side network link sniffing (like AMD and any other network probe does), lost packets can be detected only indirectly, by observing the TCP session flow and accounting the TCP retransmissions. Therefore the Failures (TCP) should be understood as number of observed failures that exemplified themselves in the TCP retransmissions.

So again - the key is to look separately at network and datacenter tiers and interpret metric values in context of concrete tier. Regarding an RFE - perhaps just resetting the Failures (TCP) to "-" when looking at the network tier?

sid_govindu
Dynatrace Organizer
Dynatrace Organizer

Couple of problems with this, the Failures metric for Client Network tier is used in most of the OOB reports in 12.4.10 causing panic among customers who are seeing A LOT of failures after upgrade.

And there is no way to isolate these failures to a client site as client site metric is not available in Applications data view.

Either the metric needs to be named appropriately or not used in OOB reports in the next SP.

Hi Sid,

Thanks for your feedback. Our Dev and Product Managers are looking into this.

Keep calm and build Community!

tarjei
Organizer

Is it possible to remove this metric. Since you are not able to get anything connected to software service it is not of great value.

Bumping on this topic, because I have the exact same problem/question. What is the verdict?

It is hard to correlate the failures to a root cause. And the figures can be alarming.

On 12.4.13, in Data Explorer for certain application (Applications>Overview>[Application]>Data Explorer - Tiers), the Tiers>Client network is red. The Client network graph shows at some moments even only Failed operations, with mouse click I can see the amounts. Clicking on 'Client network' Tier I do not see failures in the details, nor on 'Filter on Selected Data' in the graph. I do see a lot of Control lines with no Avail figure. Failure = 0.
The errors are in the RUM, when clicking the (green) Software service transaction (all_other) the failures are shown.

The application consists of a few Synthetic Transactions, and a RUM software services (SMB)

To remove the network and client network tiers (or more accurately, to stop them from showing up in the reports) - go to business unit and remove the rules from those tiers.