cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Questions about Collector cache

yohnishi
Organizer

Hi all!

Our customer has the application with about 170,000 loaded classes. The start up time of the application takes about 20 minutes without agent. With agent, the start up time takes about 70 minutes at the first time.(The environment of this test : Agent and Collector are same host, the host has about 48 logical processors and about 500 GB memory, the logical rocessor is 2.7GHz, Java heap memory max is 32 GB.)

But I changed the sensor setting, the sensor pack setting, capture setting and so on, now, the time changed about 40 minutes.I think the time depend on class size, dynamic class and collector cache.

So I have questions about collecor cache to reduce the agent start up time.

Question 1: Collector cache has orignal byte codes and instrumented byte codes. Is this correct?

Question 2: At the first time of agent instrumantation, the agent gets the sensor pack information, checks the rule matching of the sensor pack information, instruments the byte code and sends original byte codes and instrumented byte codes to collector. Is this correct?

Question 3: And at the first time of agent instrumantation, the rule matching with sensor packs takes all loaded classes. If the loading classes are 170,000 and sensor packs has 200 rules, the rule matching takes 34,000,000 times. Is this correct?

Question 4: At the second time of agent instrumantation, If the sensor settings and classes does not change and collector has cache, the rule matching takes 1 time for each class. Is this correct?

Question 5: If the application has the dynamic classes, the classes has to check the rule of sensor packs for each. If the sensor packs has 200 rules and the class does not match the rules, the rule matching takes 200 times for each. If the application has 50,000 dynamic classes, the rule matching takes 10,000,000. Is this correct?

Question 6: If the dynamic class match the rule at the timming of 200th rule, the time of insturumantation is total time of 200 rules matchig, send orignal dynamic class bytecode and insturumented byte code. Is this correct?

Question 7: Is it possible to set the instrumentation only setting without cache to agent option or collector option?

Best Regards,

Yasuo Ohnishi.

6 REPLIES 6

Joe_Hoffman
Dynatrace Champion
Dynatrace Champion

Yasuo,

A few corrections:

2) Instrumentation is done by the collector, not the agent. So the sensor rules are sent to the collector, from the AppMon Server. Collector performs the transformation of the bytecode and sends back the instrumented bytecode to the agent. This all happens during the classloading event. This is why it's important that the agent(s) and collector(s) are close to each other from a latency and bandwidth perspective.

3) Your math does not make sense. Each Class is transferred to the collector, where it's compared against a set of rules to determine if it needs to be transformed. If no transformation is required, then a code is sent back to the agent to finish loading the original class, no change required. There are also certain classes which are marked to never be instrumented and that exclusion rule is sent to the agent, so for certain 'known' classes, they don't even need to be sent to the collector, saving a lot of time. The classes are not sent to the collector for transformation all at once, but rather at the moment of class loading for each class, which can be anytime during the execution of the JRE.

If you watch the AgentOverview dashlet you'll see one of the metrics is ClassLoadCount. This could be interesting to watch in your situation.

4) I'm not sure I would say there's '1 rule match'. There's a comparision to see if the class changed.

5) Again, i'm not sure I like your math. Perhaps you're right that there are a total of x rule matches, but it's not done at the same time so this total number isn't really relavent to anything, other than to show the total number of times a given compare operation was performed over the life of an agent.

6) A given class does not need to match only one rule, but can match any number of rules, so it still has to be compared to every rule. This is a bit oversimplified, as there are optimizations that help speed this up.

7) I'm not sure I understand what you're asking.

In Summary: Reducing the number of rules can certainly have an affect on instrumentation time, however the bigger impact on instrumentation time is the time it takes to transfer the bytecode between the agent and collector. Perhaps the collector process is undersized for the amount of work it's performing. Is it CPU bound? Are there available cores when it's very busy? Perhaps more memory for the collector could be explored? Perhaps the collector can't get cores because the application is taking them all to startup the application. A critical time for both processes. In this situation, try moving the collector to another machine so it has dedicated resources and is not fighting with the application. Also consider disk space, this may not be a CPU/memory problem. Is the collector cache on a fast disk? A good use case for SSD storage. Also consider collector groups and thus distributing the instrumentation load across multiple machines.

Hope that helps.

joe

Hi Joseph-san,

Thank you for your answer! I understand the collector cashe feature.

Agent and Collector environment is not busy and the host has about 48 logical processors and about 500 GB memory, the logical rocessor is 2.7GHz. I think the cause is related to the number of classes and the sending and sending back time even on the same host.

So I have some questions about your answer.

Question1: About answer 2 and 3: Does the agent send all the classes to the collector at all loaded time except "certain classes"?

Question2: If Question 1 is correct, does it mean that the sending and the sending back of classes that do not need instrumentation take the time?

Question 3: If Question 2 is correct and the application has 170,000 classes, does the sending and sending back time take total time of 170,000 classes?

Question 4: If Question 2 is correct, how we can set the "certain classes" in answer 3 for reduce the sending time? Is the setting "global exclude" to custome sensor?

Best Regards,

Yasuo Ohnishi.

Joe_Hoffman
Dynatrace Champion
Dynatrace Champion

Yasuo-san,

Classes are sent to the collector one at a time, when they are loaded by the classloader.

170k classes is a lot, but perhaps there is something that's causing additional class loading events. That is why I would watch the "Loaded Classes" counter in Agents Overview. Does that value get larger than ~ 170k?

Bytecode is only sent back to the agent if it needed to be instrumented. Uninstrumented (Global exclude matches?) classes are not sent back.

The "Global Exclude" does not stop the class from being sent to the collector. That switch is interpreted by the collector logic AFTER the class has already been sent to the collector. To prevent classes from being sent to the collector, use the 'exclude' parameter on the -agentpath argument in the agent.

Syntax:

"exclude" "=" "\"" Exclude { ";" Exclude } "\""

Exclude = ("starts" | "ends" | "contains") ":" className

Examples:


-agentpath:....,exclude="starts:com/dynatrace/diagnostics/foobar;contains:Proxy" would exclude classes that start with com.dynatrace.diagnostics.foobar and classes that contain the string "Proxy" in their name.

The classnames needs to be delimited by '/' instead of '.'.

Observe the agent log to see affects of these rules.

Perhaps if you can exclude large quantities of classes, you might impact the load time. Let us know if it helps

Hi Joseph-san,

Thank you for your information.

> Does that value get larger than ~ 170k?

Yes, the counter is about 170k.

So we add the exclude parameter to agentpath and exclude 140K classes. But the start up time does not change.(about 40 minutes)

Now I find the "(double quotation) in the exclude parameter.

Is the "(double quotation) certainly necessary for this option? (Our environment is Weblogic.)

If the "(double quotation) is necessary, we will re-try. If the "(double quotation) is not necessary, I think some sensor pack affect the start up time. we will check the sensor pack one by one.

Best regards,

Yasuo Ohnishi

yohnishi
Organizer

Hi joseph-san and all,

We found the cause of the slow start up!

The user java application with "JMX MBean Server" Sensor and ".NET Disable Inlining" Sensor took 40min for start up. (".NET Disable Inlining" Sensor can not disable form the sensor placement setting.)

We will ask this phenomenon to support. If anyone know the reason, would you tell us why this phenomenon occurs?

Best regards,
Yasuo Ohnishi

yohnishi
Organizer

Hi all,

I found out that default setting of Weblogic8's JMX measures was the cause of this problem. The measures are set with "*" as domain.

This "*" checks all JMX information with character-string comparison at the timing of adding JMX MBean to JMX MBean Server.

The user's environment has many Servlet and EJB classes, so this environment has too many JMX MBean. As a result, the process of checking was consumed CPU time, the start up processing time increased more.

Please be careful in case of big application and Weblogic environment.

Best Regards,
Yasuo Ohnishi