Solved: Re: OneAgent Fail-Safe Mechanism When Running Out Resources

Ardhi · ‎03 May 2023

Hi,

Is there any documentation regarding OneAgent behavior when it didn't get the needed resources, specifically for CPU and memory?

From what I understand Dynatrace only consume minimal resources, depend on the usage (Full Stack or Infra only, number of monitored process, etc).

Also read from a post regarding monitoring overhead in which we need to test before and after the infrastructure is instrumented.

https://www.dynatrace.com/resources/ebooks/javabook/controlling-measurement-overhead/

These are the contributing factor, but is there (maybe from the developer) any fail-safe mechanism in OneAgent?

Something like when it only gets 80% of the resources needed, Dynatrace would reduce monitoring to infra-only mode, then at 50% it would disable network monitoring, etc

The reason for this is we need a precise measurement of resources allocation for the application in the infrastructure with the expected behavior of the components.

Thank you,

Ardhi

Tom_Eaton · ‎03 May 2023

Hi @Ardhi,

It's not publicly documented IIRC. Each OneAgent component (oneagentos, oneagentnetwork etc) has some thresholds for both memory and CPU coded in and when reached or breached it does try and throttle itself. Also the watchdog component will step in as a last resort, so when memory usage is very high, the watchdog will restart the component in question.

I would raise an RFA (Request for Assistance) via Dynatrace One, to get further details on the thresholds and detailed throttling mechanisms.

y_buccellato · ‎15 May 2023

Oneagent has in place selfmonitoing system and watchdog that will shutdown, restart or throttle the process itself.

If you take a look on the documentation, for the network module, it is stated that in case Oneagent overhead increases above 5% of CPU it will throttle by pausing itself for longer and longer period to a maximum of 45 minutes(here).

And for the infrastructure and Apm part the agent will restart itself if it gets more that 5% of memory (which happened to me one time in 4 year cause an AIX system had an issue with memory allocation).

Reagards,

Yann