Can anyone give me an in-depth answer for when dynatrace is injecting code into a purepath for .net processes as well as at what points along the path information is being logged then sent to the collector? I have a dev who is wanting more definitive answers on where the tracing takes place along the path and where we could see overhead. I know overhead will be dependent on the app as well as level of instrumentation from our side but at least an idea would be great.
Is data logged once a give method etc is finished being used at each step or when the purepath is completed?
There are two sorts of method sensors: static sensors and auto-sensors.
When you refer to injecting (byte) code (into the byte code of a method), that's a static sensor. The actual injection of the byte code is done as the application methods are being loaded into the running process at the beginning of process execution. There will be some impact on the time taken for the application to start. If classes are loaded dynamically, there will also be a small impact on the application in order to inject byte codes into those methods. In both cases, a measure is available within Dynatrace which will report the impact of byte code injection on the application.
Once the application is up and running and processing requests, if there is a static sensor placed on a method, Dynatrace will get control – within the application thread – both at the beginning and the end of method execution, in order to measure the elapsed time, CPU time, GC time, etc. If that's all the static sensor does, the overhead is very small. Some static sensors also have special functions: for example, the ADO .NET static sensors collect the SQL statement that is being executed, and possibly the bind values; the sensor that catches exceptions collects the exception details and may also collect a stack trace. So the overhead due to static sensors is very dependent on various options, such as whether database aggregation or exception aggregation is turned on, the maximum length of strings captured, and which header fields are collected from web requests.
In production, you'd normally configure static sensors so that 100-200 method calls were collected for every transaction. But of course that depends on the kind of application and what you're trying to achieve.
Auto-sensors monitor the threads which are executing transactions, and collect a stack trace every so often. The goal is identify which methods are transaction "hot spots" for response time, CPU time and so on. The auto-sensors measure how much time they take in the application thread and adjust their execution rate to stay within a specific overhead percentage (which is configurable).
The PurePath is just the result of this data: an end-to-end view of every transaction, through all tiers, with automatic hot spot detection and diagnostic details down to the method level. To put it another way, the sensors are the mechanism and the PurePath is the result.
It is straightforward to configure Dynatrace so that the impact on response time within the application threads is less than one percent.
All this data is passed to a separate background thread within the application process, to be transmitted from the agent (i.e., application process) to the Collector and thence to the Dynatrace Server. The agent does a small amount of buffering (e.g., collecting a packet's worth of data) but basically sends data as soon as it is collected.
Another background thread collects system information (such as CPU, memory, etc.).
So the total impact on the system will fall somewhere in the 2-3% range, again depending on configuration options.
Thanks a lot email@example.com for this in-depth explanation!
One customer is worried about the amount of network traffic between the application and the collector and also between the collector and the server (in startup time to do the instrumentation and in normal operation to send the collected information), do you have any metric or any detailed explanation about this?
You can get an estimate of network traffic during normal operation from the Deployment Guide: Deployment Guide
I don't think it's possible to make a useful estimate of traffic during start-up because of the number of different factors involved. If it's really a concern, I would drop a host agent on the system(s) running the Collector(s) and actually collect the data. Without a network sniffer of some sort, you'll only get aggregate data, but if you don't see a spike of any sort, that should be enough of an answer.
... and as a note on start up traffic and overhead - keep in mind that the work done/generated by the .NET agent during start up is going to decrease perhaps significantly with v6.2, which is due for release on Monday June 29. Estimates (as stated in public webinars and the release notes) indicate that the number of supported .NET agents per collector will be increasing from approx. 25-50 to approx. 200-250.
I'm busy building a Dynatrace Monitoring (Agent) Overhead dashboard to satisfy a requirement by a prospective client (we're in POC phase currently). I found some measurements that hint at overhead caused by monitoring like "Communication Bytes Transported", "Bytes Read From agent" and "Bytes Read From Agent (Auto Sensor)" that presumably highlight network overhead by data being received from agents/sensors. The problem is that these don't give any insight into the overhead caused on the hosts where agents are deployed.
In your comment above you mention " In both cases, a measure is available within Dynatrace which will report the impact of byte code injection on the application." - Which measure(s) do you refer to. I tried plotting "Auto Sensor CPU Time" and it plots measures for all agents (deployed) but they all flatline on 0 (zero) nanno seconds.
Please advise on feasable measurements I can use.
The measures are "Agent Wait Time During Class Transformation" and "SUD Wait Time During Class Transformation" in the Dynatrace Self-Monitoring profile. In your case, the SUD (system under diagnosis) time would be most relevant.
However, the measures are designed more as a diagnostic tool for a particular problem than a general-purpose monitoring tool. Unless the application is creating dynamic classes, the measures will be zero once the application is up and running.
The only way to measure the impact of Dynatrace on the application process is to do a CPU dump, which will give you a number for Dynatrace overhead – and there's no measure for that.
We did performance testing of a majority of our systems and found (as Graeme has said) it depends on the number of purepaths and the number of nodes (sensors) in the purepaths. Each application is different, so for an accurate measurement I'd recommend a load test; with and without instrumentation.
We had a couple of Java systems that went up about 6% - 8% because there were literally thousands of EJBs and database calls measured per transaction. This system saw some peak times of 100 requests per second, so as you could expect our CPU load went up because there were tens of thousands of more instructions executed per request.
One of our highest used .NET applications also saw something like this because we had added too many static sensors--imagine adding a sensor for all properties in a namespace by mistake. The overhead was tremendous because, like the EJB counters, there were tens of thousands of extra executions per request.
As for memory we saw about 100 - 150MB of overhead per instrumented application. This is something to be aware of when you have 20 applications running on one server as it will consume 2GB of memory.
Also, one thing to keep in mind is turning on the deep leak analysis in .NET can be a killer if your system does a lot of allocations. I don't recommend turning it on for a memory heavy (cache heavy) system.