<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Why OneAgent's retransmissions metric can diverge when comparing to other tools in Troubleshooting</title>
    <link>https://community.dynatrace.com/t5/Troubleshooting/Why-OneAgent-s-retransmissions-metric-can-diverge-when-comparing/ta-p/273711</link>
    <description>&lt;DIV class="lia-message-template-content-zone"&gt;
&lt;H1 id="toc-hId-1846786847"&gt;Abstract&lt;/H1&gt;
&lt;P&gt;It's possible that in many production environments, OneAgent retransmission metrics may diverge from those of other tools such as "wireshark tcp.analysis.retransmission" or "netstat -s" output. There are a couple of reasons for that. The basic thing is to have a good comprehension of TCP retransmission, what types it has, and eventually how it relates to OneAgent's network module calculated metrics retransmission.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1 id="toc-hId--705370114"&gt;Problem&lt;/H1&gt;
&lt;P&gt;This issue might concern network retransmission metrics. Focusing on metrics identifiers for Grail only and ignoring older Metrics API identifiers for clarity.&amp;nbsp; Those metrics refer to the following Grail metrics identifiers.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;dt.process.network.packets.re_tx_aggr&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp; - number packets transmitted as a retransmission&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;dt.process.network.packets.re_rx_aggr&lt;/EM&gt;&amp;nbsp;&lt;/STRONG&gt; - number packets received as retransmissions&lt;/P&gt;
&lt;P&gt;Also, it is important to take into account what&amp;nbsp;packets might be subject to retransmission:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;dt.process.network.packets.base_re_tx_aggr&lt;/STRONG&gt; -&amp;nbsp;&lt;/EM&gt;number of sent retransmission base packets&lt;BR /&gt;&lt;EM&gt;&lt;STRONG&gt;dt.process.network.packets.base_re_rx_aggr&lt;/STRONG&gt; -&amp;nbsp;&lt;/EM&gt;number of received retransmission base packets&lt;/P&gt;
&lt;P&gt;The above 4 metrics operate on absolute packets sum which may be clumsy. When&amp;nbsp;comparing retransmission between two points, it is better to analyze the percentage of retransmitted packets per process, PGI, host, or network interface. From percent definition percentage of retransmitted packets is given by above abstract expression.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;retransmissions% = dt.process.network.packets.re_tx_aggr / dt.process.network.packets.base_re_rx_aggr * 100%&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Full DQL expression to get this for e.g. given process group -&amp;nbsp;&lt;EM&gt;ROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11 -&lt;/EM&gt;&amp;nbsp;is quite complex:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;timeseries Tx=avg(dt.process.network.packets.re_tx_aggr), nonempty:true, timeframe:"00:30/04:30",
    filter: { matchesValue(dt.entity.process_group_instance, "PROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11") }
| join [
      timeseries TxBase=avg(dt.process.network.packets.base_re_tx_aggr), nonempty:true, union:true,
          filter: { matchesValue(dt.entity.process_group_instance, "PROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11") }
    ], kind:leftOuter, on:{timeframe}
| fieldsAdd {Txperc = 100 * (Tx[]/right.TxBase[])}
| fields Txperc, timeframe, interval
| fieldsAdd metricName = "Nginx retranmissions sent out"&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1 id="toc-hId--1514716740"&gt;Resolution&lt;/H1&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Clarifications which packets are recognized as a retransmission by OneAgent&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Retransmitted TCP segment is a segment with a duplicated sequence number within a defined period of time (maximal retransmission timeout) i.e. it has been sent more than once in a defined direction. Duplicated packets are fully visible for the outgoing direction, for the incoming direction duplicated packets usually are not visible because the primary packet didn’t reach the destination host and got lost somewhere before.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;TCP segments without data (data length == 0) are not considered as a retransmission. That lets us exclude duplicate ACK (SACK) and TCP fast retransmission. From a performance point of view these events are not interesting because the application doesn't need to stop sending by longer time and change its state to waiting.&amp;nbsp; The OneAgent focuses only on timeout retransmission which acts adversely on transmission throughput performance through network and usually slows down application.&lt;/LI&gt;
&lt;LI&gt;Duplicated TCP SYN and FIN are considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;TCP keep-alive segment (with data size of 1 byte and duplicated sequence number) are not considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;For incoming direction out-of-sequence packets can be considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;Retransmissions are calculated for incoming or outgoing communication channels.&lt;/LI&gt;
&lt;LI&gt;Retransmissions are not calculated for local or forwarded TCP sessions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;Reference to other tools&lt;/H3&gt;
&lt;P&gt;Wireshark with filter &lt;EM&gt;tcp.analysis.retransmission&lt;/EM&gt;&amp;nbsp; is certainly a more sophisticated tool and offers more options e.g out-of-order or spurious retransmission classification.&amp;nbsp; &lt;A href="https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTCPAnalysis.html" target="_self"&gt;Wireshark documentation&lt;/A&gt;&amp;nbsp;gives more details.&amp;nbsp; It's worth emphasizing that when the OneAgent network module and wireshark run in parallel each of them may see a bit&lt;SPAN&gt;&amp;nbsp;different set of packets. This occurs due to bpf (libpcap) doesn't guarantee that&amp;nbsp;&lt;/SPAN&gt;100% of packets will be captured due to limited size of used buffers.&lt;/P&gt;
&lt;P&gt;Regarding netstat -s tool. This tool prints out a lot tcp counters which are global per TCP/IP stack.&amp;nbsp; Among these counters is number of packets retransmitted&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;root@kpi-server:/var/log/dynatrace/oneagent/os# netstat -s |grep retransmitted
102383510128 segments received
169869495375 segments sent out
425003509 segments retransmitted
5579 bad segments received&lt;/LI-CODE&gt;
&lt;P&gt;As you can see netstat as well as wireshark don't aggregate the retransmission metric per process. These metrics usually are accessible only per host or per network adapter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1&gt;Troubleshooting steps&lt;/H1&gt;
&lt;P&gt;If retransmissions reported by the OneAgent are &lt;SPAN class="HwtZe"&gt;&lt;SPAN class="jCAhz ChMk0b"&gt;&lt;SPAN class="ryNqvb"&gt;definitely&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;higher that reported by another tool.&amp;nbsp; You can disable incoming retransmissions and check again. Incoming retransmissions can be disabled per host by setting runtime flag &lt;STRONG&gt;debugNetAgentDisableIncomingRetransmissionsNative&lt;/STRONG&gt; to true by the support team or with the environment variable:&lt;/P&gt;
&lt;PRE&gt;&lt;STRONG&gt;DT_DEBUGFLAGS=debugNetAgentDisableIncomingRetransmissionsNative=true&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;</description>
    <pubDate>Thu, 03 Apr 2025 10:28:52 GMT</pubDate>
    <dc:creator>pawel_stenka</dc:creator>
    <dc:date>2025-04-03T10:28:52Z</dc:date>
    <item>
      <title>Why OneAgent's retransmissions metric can diverge when comparing to other tools</title>
      <link>https://community.dynatrace.com/t5/Troubleshooting/Why-OneAgent-s-retransmissions-metric-can-diverge-when-comparing/ta-p/273711</link>
      <description>&lt;DIV class="lia-message-template-content-zone"&gt;
&lt;H1 id="toc-hId-1846786847"&gt;Abstract&lt;/H1&gt;
&lt;P&gt;It's possible that in many production environments, OneAgent retransmission metrics may diverge from those of other tools such as "wireshark tcp.analysis.retransmission" or "netstat -s" output. There are a couple of reasons for that. The basic thing is to have a good comprehension of TCP retransmission, what types it has, and eventually how it relates to OneAgent's network module calculated metrics retransmission.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1 id="toc-hId--705370114"&gt;Problem&lt;/H1&gt;
&lt;P&gt;This issue might concern network retransmission metrics. Focusing on metrics identifiers for Grail only and ignoring older Metrics API identifiers for clarity.&amp;nbsp; Those metrics refer to the following Grail metrics identifiers.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;dt.process.network.packets.re_tx_aggr&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp; - number packets transmitted as a retransmission&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;dt.process.network.packets.re_rx_aggr&lt;/EM&gt;&amp;nbsp;&lt;/STRONG&gt; - number packets received as retransmissions&lt;/P&gt;
&lt;P&gt;Also, it is important to take into account what&amp;nbsp;packets might be subject to retransmission:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;dt.process.network.packets.base_re_tx_aggr&lt;/STRONG&gt; -&amp;nbsp;&lt;/EM&gt;number of sent retransmission base packets&lt;BR /&gt;&lt;EM&gt;&lt;STRONG&gt;dt.process.network.packets.base_re_rx_aggr&lt;/STRONG&gt; -&amp;nbsp;&lt;/EM&gt;number of received retransmission base packets&lt;/P&gt;
&lt;P&gt;The above 4 metrics operate on absolute packets sum which may be clumsy. When&amp;nbsp;comparing retransmission between two points, it is better to analyze the percentage of retransmitted packets per process, PGI, host, or network interface. From percent definition percentage of retransmitted packets is given by above abstract expression.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;retransmissions% = dt.process.network.packets.re_tx_aggr / dt.process.network.packets.base_re_rx_aggr * 100%&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Full DQL expression to get this for e.g. given process group -&amp;nbsp;&lt;EM&gt;ROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11 -&lt;/EM&gt;&amp;nbsp;is quite complex:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;timeseries Tx=avg(dt.process.network.packets.re_tx_aggr), nonempty:true, timeframe:"00:30/04:30",
    filter: { matchesValue(dt.entity.process_group_instance, "PROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11") }
| join [
      timeseries TxBase=avg(dt.process.network.packets.base_re_tx_aggr), nonempty:true, union:true,
          filter: { matchesValue(dt.entity.process_group_instance, "PROCESS_GROUP_INSTANCE-54FF3ADD0B5EDB11") }
    ], kind:leftOuter, on:{timeframe}
| fieldsAdd {Txperc = 100 * (Tx[]/right.TxBase[])}
| fields Txperc, timeframe, interval
| fieldsAdd metricName = "Nginx retranmissions sent out"&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1 id="toc-hId--1514716740"&gt;Resolution&lt;/H1&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Clarifications which packets are recognized as a retransmission by OneAgent&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Retransmitted TCP segment is a segment with a duplicated sequence number within a defined period of time (maximal retransmission timeout) i.e. it has been sent more than once in a defined direction. Duplicated packets are fully visible for the outgoing direction, for the incoming direction duplicated packets usually are not visible because the primary packet didn’t reach the destination host and got lost somewhere before.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;TCP segments without data (data length == 0) are not considered as a retransmission. That lets us exclude duplicate ACK (SACK) and TCP fast retransmission. From a performance point of view these events are not interesting because the application doesn't need to stop sending by longer time and change its state to waiting.&amp;nbsp; The OneAgent focuses only on timeout retransmission which acts adversely on transmission throughput performance through network and usually slows down application.&lt;/LI&gt;
&lt;LI&gt;Duplicated TCP SYN and FIN are considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;TCP keep-alive segment (with data size of 1 byte and duplicated sequence number) are not considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;For incoming direction out-of-sequence packets can be considered as a retransmission&lt;/LI&gt;
&lt;LI&gt;Retransmissions are calculated for incoming or outgoing communication channels.&lt;/LI&gt;
&lt;LI&gt;Retransmissions are not calculated for local or forwarded TCP sessions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3&gt;Reference to other tools&lt;/H3&gt;
&lt;P&gt;Wireshark with filter &lt;EM&gt;tcp.analysis.retransmission&lt;/EM&gt;&amp;nbsp; is certainly a more sophisticated tool and offers more options e.g out-of-order or spurious retransmission classification.&amp;nbsp; &lt;A href="https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTCPAnalysis.html" target="_self"&gt;Wireshark documentation&lt;/A&gt;&amp;nbsp;gives more details.&amp;nbsp; It's worth emphasizing that when the OneAgent network module and wireshark run in parallel each of them may see a bit&lt;SPAN&gt;&amp;nbsp;different set of packets. This occurs due to bpf (libpcap) doesn't guarantee that&amp;nbsp;&lt;/SPAN&gt;100% of packets will be captured due to limited size of used buffers.&lt;/P&gt;
&lt;P&gt;Regarding netstat -s tool. This tool prints out a lot tcp counters which are global per TCP/IP stack.&amp;nbsp; Among these counters is number of packets retransmitted&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;root@kpi-server:/var/log/dynatrace/oneagent/os# netstat -s |grep retransmitted
102383510128 segments received
169869495375 segments sent out
425003509 segments retransmitted
5579 bad segments received&lt;/LI-CODE&gt;
&lt;P&gt;As you can see netstat as well as wireshark don't aggregate the retransmission metric per process. These metrics usually are accessible only per host or per network adapter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1&gt;Troubleshooting steps&lt;/H1&gt;
&lt;P&gt;If retransmissions reported by the OneAgent are &lt;SPAN class="HwtZe"&gt;&lt;SPAN class="jCAhz ChMk0b"&gt;&lt;SPAN class="ryNqvb"&gt;definitely&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;higher that reported by another tool.&amp;nbsp; You can disable incoming retransmissions and check again. Incoming retransmissions can be disabled per host by setting runtime flag &lt;STRONG&gt;debugNetAgentDisableIncomingRetransmissionsNative&lt;/STRONG&gt; to true by the support team or with the environment variable:&lt;/P&gt;
&lt;PRE&gt;&lt;STRONG&gt;DT_DEBUGFLAGS=debugNetAgentDisableIncomingRetransmissionsNative=true&lt;/STRONG&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 03 Apr 2025 10:28:52 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Troubleshooting/Why-OneAgent-s-retransmissions-metric-can-diverge-when-comparing/ta-p/273711</guid>
      <dc:creator>pawel_stenka</dc:creator>
      <dc:date>2025-04-03T10:28:52Z</dc:date>
    </item>
    <item>
      <title>Re: Why OneAgent's retransmissions metric can diverge when comparing to other tools</title>
      <link>https://community.dynatrace.com/t5/Troubleshooting/Why-OneAgent-s-retransmissions-metric-can-diverge-when-comparing/tac-p/287618#M997</link>
      <description>&lt;P&gt;This is a great write up&amp;nbsp;&lt;a href="https://community.dynatrace.com/t5/user/viewprofilepage/user-id/25212"&gt;@pawel_stenka&lt;/a&gt;. I see these 'discrepancy' segments from time to time. Its also important to ensure that the tools/data you are comparing fall all within the same collection period and sampling rates - an example of this was when we had an issue on a particular application and when the data was reviewed on the native monitoring, the spikes were not seen, however they were in Dynatrace. The reason was because the native tool was collecting an average over 10 mins, where Dynatrace was pulling every 1 min.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Oct 2025 11:16:45 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Troubleshooting/Why-OneAgent-s-retransmissions-metric-can-diverge-when-comparing/tac-p/287618#M997</guid>
      <dc:creator>ChadTurner</dc:creator>
      <dc:date>2025-10-10T11:16:45Z</dc:date>
    </item>
  </channel>
</rss>

