Can someone from dynatrace be a bit more precise like here https://www.dynatrace.com/resources/ebooks/javaboo...?
How much is the overhead for cpu, ram, heap, network, responstime? It is allways a question by the customer and angry operators.
I'll paste the responses with information from dev directly form a support case where this was asked:
The overhead of the Agent (regarding response time, CPU, memory & network traffic) depends on so many factors and their correlations that it's just impossible to estimate it upfront.
Such factors include, but are not limited to e.g.:
- from the application perspective:
- used technologies (Java, .NET, PHP, Apache, node.js, ...)
- involved frameworks
- requests per second (or some other load indicator)
- general CPU consumption
- number of concurrently executed threads
- amount of remote communication and IO interaction
- error handling, e.g. number of thrown exceptions
- complexity of the code in general
- from the environment perspective:
- operating system and version
- hardware, e.g. CPU
- any potential virtualization in place
- disk & network throughput
- inter-process mutexes and other dependencies
- from the Agent & configuration perspective:
- type of Agent (OS Agent, Java Agent, Network Agent, Log Agent, ...)
- number of placed sensors & instrumented methods
- level of detail of captured information
Measuring the overhead is also non-trivial, as the Agent itself is part of the whole system and can as such not easily measure itself. The only really reliable way to measure the overhead is and always will be to make reasonable load tests with an external tool including reproducible load and warmup/cooldown phases executed multiple times with and without the Agents.
However, over the years we investigated multiple approaches to try to estimate and limit the Agent overhead. Some of those turned out to be effective (e.g. around Auto-Sensor overhead limitations) and are implemented in the Agent code itself. Also, we're always trying to make the Agents even more robust also in terms of overhead.
In addition to that we're always selecting our default configurations to be as low-overhead and as production-ready as possible while still providing valuable information.
To ensure this we execute automated performance tests on a daily basis which will instantly show us if something got better or worse.
so here's what we do in terms of overhead measurement and reduction:
Results may vary, depending on the environment. It's best to take a free trial to be 100% sure. I have although taken a look at one of our internal tenants monitoring our production systems and found following data:
Working set size:Oneagent: 80MB AVG , 204MB MAX plus optional components - log (18MB and 23MB) and network (56MB and 59MB respectively).
CPU usage: Oneagent: 0.51% AVG, 2.73% MAX plus optional components - log (0.05% and 0.16%) and network (0.08% and 1.8% respectively).
Network: Oneagent 619Bps AVG, 730Bps MAX (that includes network component) plus optional component - log 298Bps AVG, 348Bps MAX.
I have also made some tests. As said by the previous persons, the impact on CPU/Memory/network (threads for PHP) may vary according your application, but the are of relatively low impact.
But I can say that the most important thing to monitor is the response time of your application, here may be the real impact.