cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Understanding AMD sampling

cosmin_gherghel
Dynatrace Pro
Dynatrace Pro

I am trying to understand why one of our AMDs is entering sampling mode and dropping packets. From what I can see it is not a change in amount of traffic. Its like all of a sudden the AMD decides to drop packets due to sampling mode.

I found the following in the logs but it doesn't tell me very much other than it is dropping. Any ideas would be appreciated.

L3 2016-05-12
13:47:03.279
0:[Dispatcher]@common/sampling_algorithms.cpp:203 State:
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][*0*][44]
Receive
history:[pkts:1778027853 dropped:80949 passed_pkts:1773461562][pkts:1777759714
dropped:0 passed_pkts:1773228655][pkts:1777568508 dropped:0
passed_pkts:1773001533][pkts:1777315509 dropped:0
passed_pkts:1772797919][pkts:1777142825 dropped:0 passed_pkts:1772575987][pkts:1776881958
dropped:0 passed_pkts:1772356023]
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@common/sampling_algorithms.cpp:204 Current pps: 44.28
kpps [measure from last 20 secs]
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@common/sampling_algorithms.cpp:205 PEAK pps: 71.14 kpps
[measure from last 300 secs]
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@common/samplingmanager.cpp:84 Sampling level changed to:
19/20 [DOWN]
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@rabase/driverfiltersconfig.cpp:195 SamplingFiltering:
Using the default session acceptance filter distribution table for IPv4
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@rabase/driverfiltersconfig.cpp:203 SamplingFiltering:
Using the default session acceptance filter distribution table for IPv6
L3 2016-05-12
13:47:03.279
0:[Dispatcher]@rabase/customdriverapi.cpp:489 The sampling filtering was
configured in the RTM driver. numerator:19, denominator:20
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@common/sampling_algorithms.cpp:203 State:
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][*0*][45][44]
Receive
history:[pkts:1778229982 dropped:262112 passed_pkts:1773701536][pkts:1778027853
dropped:80949 passed_pkts:1773461562][pkts:1777759714 dropped:0
passed_pkts:1773228655][pkts:1777568508 dropped:0
passed_pkts:1773001533][pkts:1777315509 dropped:0 passed_pkts:1772797919][pkts:1777142825
dropped:0 passed_pkts:1772575987]
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@common/sampling_algorithms.cpp:204 Current pps: 45.18
kpps [measure from last 20 secs]
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@common/sampling_algorithms.cpp:205 PEAK pps: 62.67 kpps
[measure from last 300 secs]
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@common/samplingmanager.cpp:84 Sampling level changed to:
18/20 [DOWN]
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@rabase/driverfiltersconfig.cpp:195 SamplingFiltering:
Using the default session acceptance filter distribution table for IPv4
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@rabase/driverfiltersconfig.cpp:203 SamplingFiltering:
Using the default session acceptance filter distribution table for IPv6
L3 2016-05-12
13:47:08.287
0:[Dispatcher]@rabase/customdriverapi.cpp:489 The sampling filtering was
configured in the RTM driver. numerator:18, denominator:20
L3 2016-05-12
13:47:10.937
0:[schedulerData]@diag/diagmgr.cpp:48 Finished generating diagnostic
data from 1 providers
L3 2016-05-12
13:47:13.287
0:[Dispatcher]@common/sampling_algorithms.cpp:203 State:
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][*0*][46][45][44]
Receive
history:[pkts:1778499965 dropped:384083 passed_pkts:1773931288][pkts:1778229982
dropped:262112 passed_pkts:1773701536][pkts:1778027853 dropped:80949
passed_pkts:1773461562][pkts:1777759714 dropped:0
passed_pkts:1773228655][pkts:1777568508 dropped:0
passed_pkts:1773001533][pkts:1777315509 dropped:0 passed_pkts:1772797919]
10 REPLIES 10

bcox2
Organizer

Have you recently added more Software services? What Analyzers types are they using?

We have not added anything new. This AMD is used to monitor VPN traffic, auto discovery is enabled to be able to view the performance/counts of users connected to the VPN.

chris_v
Dynatrace Pro
Dynatrace Pro

Packet drop on th eAMD is usually a sign of CPU exhaustion. Usually I wouldn't expect to see that at such a low level of throughput.

Can you confirm the hardware specs of the AMD.

Particularly if you're using a supported NIC or not, and if you are is the AMD configured to use the 'customized' drivers and not the 'native' drivers.

The CAS has AMD Diagnostics reports, check the CPU reports on that, ensure the CPUs aren't taxed - especially Core 0 (it has to do all the filtering, deduplication, reordering, and usually handles the NIC interrupts too), it can become a bottleneck well before the rest of the AMD if things aren't configured right.

I looked at the CPU stats and there are a few spikes that reach 100% but not anything constant. Core 0 looks fine around 4%. Spikes are on cores 3,5,6. Since customer is just interested in the network level data we removed the autodiscovery rules and are just monitoring the traffic using the generic decode. Haven't seen any drops yet.

Sampling occurs if AMD starts dropping packets and typically is the result of CPU exhaustion as Chris V. suggested.

Without detailed information (logs, configuration, samples) it is not possible to find the actual reason.

Here are few hints:

  • Based on the screenshot the AMD activated the first level of sampling (still 95% is analyzed), so it is possible the AMD is running on its capacity limit and occasionally it gets overloaded (even by a small margin) and in result sampling is activated. Some tuning might help, e.g., increasing the shared memory size, or defining more user defined software services instead of relying on auto-discovery.
  • Even if no new software services were added configuring more complex analysis (additional user defined URLs for example) might increase CPU utilization and result in sampling.
  • Even if traffic volume is the same the traffic profile might change and in result the AMD gets more traffic for software services which requires "heavy" analysis; this in turn increases CPU utilization and might result in sampling.
  • What type of driver is used? The native driver offers less efficient traffic filtering. Hence if the traffic to be analyzed stays the same but AMD gets more traffic on NICs to be filtered out it increases load and might result in sampling.

I would suggest to open a support case if none of the above hints help to resolve the problem.

Hi Sebastian,

Thank you for the suggestions, I did see some spike in CPU utilization on cores 3,5, and 6 but not constant and we are using the custom driver. Customer is more interested in the network level statistics so we decided to remove the autodiscovery rules and monitor all traffic using the generic decode. Haven't seen any drops yet.

jaroslaw_orlows
Dynatrace Pro
Dynatrace Pro

We've got a high level insight into the way AMD sampling works. Might be helpful: AMD Sampling

matt_evanson1
Organizer

Getting more details of sampling root cause would be very helpful. Relying upon support to determine it is not efficient.
Also, is there a way to reduce the length of sampling or the recheck if sampling is still needed? It seems like 1 hour is the default setting.


Hey Matt,


From Jaroslaw's link, you can find:


You can set the time for the volume analysis recovery by modifying
the interval at which the sampling mode increases the analysis volume. This setting is located on
the AMD in
the
rtm.config file.


Change the default value of 3600
seconds (which is 1 hour) to the number of seconds you want between sampling mode increases.









sampling.config.noDropTimePeriod=3600


Note:
We recommend that you do not set this value
lower than
900 seconds (which is 15 minutes).

Very good to know this! Thank you very much