I'll have limited access to the mail and phone till Deceember 9th, I'll reply you as soon as possible.
Je suis en client avec accès limité au courrier et au téléphone jusqu’au 9 Demcembre. Je vous répondrai le plus tôt possible.
Tendré acceso limitado al correo y al teléfono hasta el 9 de Diciembre. Responderé a sus correos lo más pronto que me sea posible
without being able to solve this conclusively on the screenshot alone, let me list some potential causes (i.e., what does the dt server send over the network):
- Scheduled reports sent via e-mail (those would have to be some gigantic reports, though).
- Responses to requests from clients, such as charting and Purepath data. Shouldn't amount to that much in general, but if 30 clients try to get 10 million pps at once and the server doesn't go out of memory, it could mean quite some traffic.
- Responses to requests from collectors, mainly agent resources (binaries for different agent types / versions), a much smaller amount from configuration (system profile sync). This could cause quite some traffic if you have a lot of collectors with a lot of agents and e.g. have done an update recently. I'm not sure if it can cause _that_ much, though.
- Session exports from the server: Probably the most likely (dt internal) cause, you can have any amount of data generated if you export a big session
- Support archive: A lot of (log and other) files can be transferred if you have a lot of agents, together with a large self-monitoring session.
- Data from some custom dynatrace plugin (can generate any sort of traffic)
- Data sent to the pwh, but this shouldn't be in bursts, rather a constant sort of traffic
- Data sent via the various "export to splunk" etc. features
- Some external cause, like file transfer from the dt machine.
It would probably be easier to narrow down the source if we knew the other endpoints of these requests or had the server log.
There are also some "... Bytes Transported" self-monitoring measures that you can chart to get some hints (frontend and backend servers).
We have maximum 7 to 8 clients but actively using only 2 to 3 and currently we don't have any scheduled reports except the few email notifications.
I made a below chart for that specific time when the situation occurred for further understanding.
In the communication bytes transferred the metrics are min, avg, max and the dark brown line is a sum aggregation.
the first screenshot (from the Windows tool) suggests a rate of 600MBit/s for 10 minutes, which amounts to roughly 36 GB (if we assume 10 MBit/s to be roughly 1 MB/s). The dynatrace screenshot suggests roughly 40 MB per minute transferred by the backend server, with 4 spikes of 60 each, which would make a sum of roughly 500 MB for the 10 minutes.
I suggest to go ahead and measure this with a third party tool to see what's what. Also, inspect the server log, maybe you can see something for 17:29, 17:33, 17:34 and 17:39, but the spikes aren't really that high from the backend server point of view (60MB/min vs 40MB/min). Best regards,
Problem again occurred on the same time and this time I correlated network utilization, session storage disk read is too high and the memory page faults are also too high.
You can also find the attached logs of the APM server and two other external plugins.
Need you expert analysis to understand and overcome on this situation, thanks.