Information:

Environment

Affected Versions: all 

Description

Introduction

The collector is a crucial component of a AppMon installation and can under circumstances get into bad shape.

If you can see the following messages in the collector log (since dT 5.5)

INFO [ClassCacheWritingThread] Class cache size on disk: 35,151,172,526 bytes (33522.77M) 
INFO [ClassCacheWritingThread] Inheritance map files size on disk: 36,990,428 bytes (35.28M) 

you should consider to act, since these are only written when exceeding a specific threshold. To be more exact, it depends on how comprehensive all the byte-code of all the applications that are connected to this collector is. So the size might not be concerning but the tendency how fast it is growing after everything was instrumented once.

The reason why the class cache can constantly grow are more and more popular software frameworks that are in use in the monitored applications which are creating classes dynamically. Those classes are often just created and used for single requests and dropped afterwards, but the collector stores it in the class cache.

AppMon 5.5 already contains a cleanup mechanism that is dropping classes out of the class-cache that have not been in use for at least a week, but the health could also be affected by a huge inheritance map (IMAP cleanup targeted for dT 5.6).

Affects

Only Java applications, since AppMon is currently not able to instrument dynamic .NET assemblies.

Solution

The only solution we currently have is to exclude those dynamically created classes on the agent side so they are not sent to the collector anymore. After implementing those exclusions the class cache needs to be dropped to get rid of the old occurances.

Determine classes

With IMAP
pro: small set of files to copy
con: same named classes from same package or namespace only show up once

  1. Copy the IMAP files (*.imap) from <DT_HOME>/collector/cache to a different location
  2. Download and execute:

  1. Find similar named classes like

    com.foobar.package.DynamicallyCreatedClass$$EnhancerByCGLIB$$c5477de5 
    com.foobar.package.DynamicallyCreatedClass$$EnhancerByCGLIB$$a3275ce2 

    To do this efficiently, it's recommended to use a tool that is able to sort the lines. (eg. Notepad++ with TextFX plugin). Alternatively you could use the attached "ClassCacheInstanceUtility.jar" tool, which will aggregate dynamic class instance entries based on most common dynamic class patterns and sort them based on their frequency in the class cache. The usage for the tool is:

    java -jar ClassCacheInstanceUtility.jar <IMAP/classes dump output file> <OPTIONAL aggregated entry threshold>

    The optional aggregated entry threshold argument determines what is the minimal number of aggregated instances of a pattern to display in the output, default is 50. The output of the tool is in the format: "<dynamic class patern root> : <number of occurances>"

With class cache
Alternatively you can execute the dump tool also on the classes subfolder (usually huge at that time -> copy maybe not easily possible) - same procedure as for the IMAP files. In case the by the monitored applications used frameworks for dynamically created classes is creating same named classes, this way is required to find out which one those are.
pro: all class cache content visible
con: big amount of data to copy 

NB: on version 6.3 and higher, classes are stored in a folder called "set1" and "set2" instead of the "classes" folder. The dump tool does not recognize the "set1" and "set2" folders when specified on the command line. To proceed, the files from "set1" or "set2" must be moved into a folder called "classes".

Exclude classes

Based on the found similar named classes create exclusion rules:

  • Restrictive

    -agentpath:....,exclude=starts:com/foobar/package/DynamicallyCreatedClass$$EnhancerByCGLIB$$

    All package '.' separators need to be replaced by '/'.

  • More comprehensive

    -agentpath:....,exclude=contains:$$EnhancerByCGLIB$$
  • Use ';' to combine 2 or more rules. Eg.

    -agentpath:....,exclude=starts:com/dynatrace/diagnostics/foobar;contains:Proxy
  • Allowed are "starts", "ends" and "contains".

As can be seen on the above examples, it needs to be added to the -agentpath command line parameter for the application JVMs.

For details on excluding classes, please see the other KB articles

Automatic clean-up

Automatic clean-up may be a sufficient remedy for growing class caches and an alternative to exclusions as described above. This depends on the rate of dynamically generated classes which get written to the class cache. For very fast growing class caches (caused by applications that generate a high number of classes dynamically), exclusions may be required. For moderately to slow growing class caches the clean-up mechanism should be sufficient.

AppMon 5.5 collectors (and higher versions) will automatically clean up the class cache (DT_HOME/collector/cache classes). This task is, with the out-of-the-box settings, maximum running every day and cleaning up classes older than 7 days. How often the clean-up is really running depends on multiple conditions. All of them need to be matched, otherwise the cleanup will not start.

ConditionCollector settingDefault value
Hour of day (check is done per minute, "cleanup start" is possible for the whole hour).
com.dynatrace.diagnostics.classcache.cleanupTime
3 (=3am, possible 0-23)
Minimum wait time after last cleanup.
com.dynatrace.diagnostics.classcache.cleanupMinimumWaitTime
12 (h)
Oldest item in the class cache needs to be older than the configured age setting. Additionally, a check is performed during clean-up if the class is still loaded by a connected agent - otherwise it is kept in the cache.
com.dynatrace.diagnostics.classcache.cleanupItemAge
168 (h, =7 days)
Growth rate since last cleanup.
com.dynatrace.diagnostics.classcache.minGrowthBeforeCleanup
500 (MB)

To adapt these settings, please see Setting AppMon Collector Debug Flags.

From AppMon 6.3 on, the partition where the class cache is located needs to have at least the same amount of space free than the current class cache consumes. Otherwise the cleanup will not start.

In addition, dynaTrace 5.6 collectors (and higher versions) will automatically clean up the inheritance maps (DT_HOME/collector/cache/*.imap). The clean up job is run every 7 days. Please note that this feature is disabled by default in AppMon 5.6 and can be enabled by adding the setting

-Dcom.dynatrace.diagnostics.imap.cleanupItemAge=604800000 

to dtcollector.ini (followed by a collector restart). 604800000 is measured in milliseconds and is the maximum age of items in the class cache that will be kept during the clean-up job. A value of 604800000 equals 7 days and is the recommended setting. This will be the default with AppMon 6 and higher.

  1. Anonymous (login to see details)

    Hi

     Will ClassCacheImapDump55.jar library work fine on dT 5.6?

     

    Regards.

     

    1. Anonymous (login to see details)

      A new version of the dump utility has been uploaded which is compatible with DT 5.6 and the upcoming 6.0 release.

  2. Anonymous (login to see details)

    Is this "exclude" parameter the same or different than adding a sensor designed to exclude.

    That sensor with have place=exclude (for all methods) and and setting Capture=inactive (for all methods).

    The above is for "all methods"...in addition to that, at the Class level there is a check mark next to 'Place'.

     

     

    1. Anonymous (login to see details)

      Hi Erik,

      this additional "exclude" parameter for the -agentpath JVM argument is different because the classes that match that setting will not be sent to the collector anymore.

      An "exclude" or "global exclude" in the system profile (what you are talking of with the rules) still can lead to a class cache explosion because the dynamically created classes are sent to the collector and stored in the class cache, but simply not instrumented.

      Regards,
      Klaus 

  3. Anonymous (login to see details)

    Excellent, thanks Klaus.

  4. Anonymous (login to see details)

    Is there a limit to how frequently the automatic clean-up can be configured to run?  

    1. Anonymous (login to see details)

      Hi Courtney,

      yes and no. It's a bit complicated algorithm where some conditions need to be met. The conditions can be changed by the below mentioned properties and by following this KB Article. Except the in the article mentioned IMAP cleanup age - which is only available since 5.6 - all properties are "live changeable" without collector restart since 5.5.

      • The growth rate needs to be exceeded since startup with a fresh class cache or the last cleanup - default is 500 MB

        com.dynatrace.diagnostics.classcache.minGrowthBeforeCleanup=500
      • The hour of the day needs to match - default is 3am

        com.dynatrace.diagnostics.classcache.cleanupTime=3
        Icon

        It is highly recommended to change this "hour of the day" setting in high load environments with multiple collectors on one host so the instances are not doing the cleanup at the same time. You might run into disk IO bottlenecks that can lead to watchdog restarts or out-of-diskspace situations in case some or all collectors are doing the cleanup at the same time.

      • The minimum wait time since the last cleanup needs to be exceeded - default is 12 hours

        com.dynatrace.diagnostics.classcache.cleanupMinimumWaitTime=12

      During the cleanup, all classes are "dropped" (not copied to the new class cache that is created during the cleanup) that are currently not loaded by an agent and exceed the following item age in the cache (default 168 hours = 7 days)

      com.dynatrace.diagnostics.classcache.cleanupItemAge=168

      This maximum item age can be reduced, but please be aware that this increases the risk - in case of an agent disconnect during the time of the running cleanup - of an erroneous "used class" removal, because the "loaded" state cannot be checked. This could lead to PurePaths with "<unknown>" nodes instead of the called method names. This is not a problem with 6.0 anymore because of changed metadata handling that came with the collector failover functionality.

      Icon

      There is a bug that appears in environments with a high class cache growth rate where unused classes are not dropped by the cleanup task. This problem can be workarounded by setting the cleanupMinimumWaitTime to a higher value than the cleanupItemAge.

       

      HTH,
      Klaus

  5. Anonymous (login to see details)

    Klaus,

     

    These changes are made to dtcollector.ini? If so, how do you have the start time change between collectors when running multiple collectors on a server?

    1. Anonymous (login to see details)

      Oh, sorry, the link in my comment behind "KB Article" should direct you to Setting AppMon Collector Debug Flags - this was replaced by the Wiki - on an edit it was wrongly changed again.

  6. Anonymous (login to see details)

    Even more comprehensive is:

    -agentpath:....,exclude="contains:CGLIB$$"
    1. Anonymous (login to see details)

      It's worth pointing out that this is an agent configuration string. Not a collector string.

      In addition, I've seen issues with certain WebLogic configurations whereby the $'s were misinterpreted. Therefore

      -agentpath:....,exclude="contains:CGLIB" might be even better.

  7. Anonymous (login to see details)

    So, how do we know, under v6.0, if the collector is indeed running the cleanup every 7 days, and where would we find what the current *.cleanupItemAge value is?  We upgraded one of our lower environments from v5.6 to v6.0 over a week ago, and we ran into this exploded cache on one of the collectors.  I had expected, then, that after returning in 7+ days, we'd find that it took up much less of the 23gb that it used to take up, but it is still the same size.

    Granted, I could just clear out the cache and start afresh, but I was hoping that this new cleanup job would've taken care of things.

    Later:  Just checked the v5.6 logs, and the cleanup job wasn't able to complete successfully then, either.  Java "BUFFER UNDERFLOW" errors.  I'm guessing that there isn't enough memory on the server for the operation.  If so, is my only option to clear the cache?  Hope not...

  8. Anonymous (login to see details)

    Hi, this page advises adding a parameter to dtcollector.ini of cleanupItemAge=604800000. The above recommends setting it via the client.

    Which is correct? If both are present, which takes precedence?

  9. Anonymous (login to see details)

    Hi,

    we just analysed the output of the download of ClassCacheImapDump60.

    Result : about 1.720.000 classes are found in the Imap, where only +- 2000 of them are dynamically created classes.  From the others I find ibm & eclipse api-classes, own framework classes, apllication classes... all of them not dynamically created.   So I think the effect of excluding dynamic classes will not give a reasonable change.

    Cleanup could create some decrease, but then again I wonder how fast the list will be filled again, as all of these classes are in the list because of being used in our applications. I can imagine that within 7 days, we come in same situation when having normal load on applications.

    My question : is 1.720.000 classes in cache an unreasonable high amount ? And if so, what is the solution to avoid ClassCacheWritingThread and unstable collectors.  We were said this is causing missing purepath, which we want to avoid at any time.

    -Monique-

  10. Anonymous (login to see details)

    Hi, 

    I just upgraded to 6.1, and we suspect that a class explosion makes our memory usage too large (around 90%), and agent instrumentation is disabled. Support staff wants me to analyze the imap file. Is there an utility jar file to analyze dT6.1 imap file?

    I was trying to analyze it using ClassCacheImapDump60.jar, but it won't work, since the dismatch of the version.

    Thanks,

    Shirley

  11. Anonymous (login to see details)

    We upgraded to version 6.2 and I do see something that looks like class explosion. Is this issue solved in 6.2 or is it a 6.2 version of this tool ?

    1. Anonymous (login to see details)

      The 6.0 dump tool can be used for a 6.2 class cache. Please note that 6.2.0 has an issue with high collector memory consumption in some cases. I suggest to install 6.2.2 (expected today) which has a fix for this issue. If the issue persists with 6.2.2 proceed with the steps of this KB article.

      1. Anonymous (login to see details)

         OK

        When I look more closely we are pr now running 6.2.1.1027. I will also make a support case on this to track it there.

        Regards
        Rolf Gunnar

  12. Anonymous (login to see details)

    Hi Klaus, is the class cache threshold determined from the expected "run-time data", so the folder size should be 25 (single collector)-50 GB tops (2 collectors, still 25 each?)?

  13. Anonymous (login to see details)

    Exclusion parameter must not be provided in double quotes per Excluding classes as an agent parameter (to be corrected later in the article body)

  14. Anonymous (login to see details)

    Hi,

    My customer currently runs Dynatrace 6.3.2.1101 and the class cache explosion is still present (despite fixes release in previous versions). We are first going to upgrade the customer to 6.3.6 (should come out in around 2 weeks from today) and then clean out the class caches, then restart all application instances - This should have the class caches recreated. If size is still a problem I'll definitely make use of the Cache cleanup process/tools mentioned in this article.

    Just a quick question... Would the "ClassCacheImapDump60.jar" tool work on 6.3.6? If not, where can I find the latest tool?

    1. Anonymous (login to see details)

      Hi Francois,

      The "ClassCacheImapDump60.jar" works for 6.3.

      Regards,

      Vince Benkert

      1. Anonymous (login to see details)

        Thank you Vince. Much appreciated.

      2. Anonymous (login to see details)

        By the way, the 6.0 dump tool also works for 6.5 but files inside the "set1" or "set2" folders must be moved to a folder named "classes" in order to be recognized by the tool.

  15. Anonymous (login to see details)

    Hi,

    two thoughts for improvement:

    1. The jar files scanning the imap files should be more advanced and sort the output directly respectively trying to identify suspicious classes directly, so that there is no need of external tools like notepad++ (just to make the experience more sophisticated)
    2. A debug flag / option on the collector directly would be indeed nice. This would avoid the overhead of configurating the respective agents. This option of course could not avoid that agents send still all classes, but this option could e.g. avoid that respectively configured classes (directly on the collector via debugflag) won't be kept in class cache anymore and therefore avoid class cache explosion. This would make it easier to maintain such settings.

    Best regards

    Dennis

    1. Anonymous (login to see details)

      UPDATE: Point 2 is actually implemented since DT 6.3. If you want the debug flag please contact support.

      1. Anonymous (login to see details)

        This is not correct. We have the collector property since DT 6.2 and it is mentioned in both linked KB articles:
        How to excluding classes as an agent parameter and
        Excluding classes as an agent parameter 

        1. Anonymous (login to see details)

          Thank you Klaus for the correction!