AIX OneAgent vs. Other OneAgents

Babar_Qayyum · ‎03 May 2021

Dear All,

Do we have any functional difference between AIX OneAgent and Other OneAgents when the number of processes is very high?

The story behind this issue is that a customer installed OneAgent to monitor the IBM Integration Bus (which has more than 100+ running processes) on the AIX server. The application started to respond slow.

Once the OneAgent monitoring stopped, the application started to respond as used to be.

Regards,

Babar

pahofmann · ‎03 May 2021

Hey Barbar,

I have customers running 200+ Apache/Websphere processes on an AIX box without issues.

Probably best to open a support case for your issue.

Dynatrace Certified Master - Dynatrace Partner - 360Performance.net

Babar_Qayyum · ‎03 May 2021

Hello @pahofmann

Thank you for sharing your kind experience.
In the past, I remember a customer having the IBM Integration Bus (which has more than 100+ running processes) on the AIX server without any issues, but this customer is complaining about the performance and somehow it is relating to the OneAgent.
I will ask Dynatrace support if they have any theory for this.

Regards,

Babar

kalle_lahtinen · ‎03 May 2021

Hi,

We have a somewhat similar experience. We're running the newer version of IIB i.e. ACE on an AIX platform, with about 60 DataFlowEngine processes. The first thing observed was that the broker startup time took double the amount of time when the agent was there, 2 hours instead of 1. Another thing we've investigated is the CPU and memory consumption, which is quite high on those servers. Overall this is challenging to analyze, because 3 things are changing with the ongoing migration: 1. From Linux to AIX 2. From IIB to ACE 3. From having no Dynatrace to having OneAgent. So we're not always sure what is caused by what 🙂

Anyway, there was a recent fix provided by IBM which helped both the broker startup time and the memory consumption per DataFlowEngine. So it's looking better now. On the other hand, it was indeed clear that the agent had a very noticeable impact on that startup time.

We're using the AIX auto-injection, and the suggestion from support was to next test it with the manual injection. We haven't done that yet. Would that make sense also for you?

The deep monitoring itself doesn't seem to have an impact on the CPU/memory usage. I have a PG monitoring rule which disables deep monitoring for exactly half of the processes. There's no difference between having it on vs. off, when it comes to CPU and memory consumption per process.

Another AIX-related observation: on this environment, the AIX osagent takes about 8-9 % CPU on most servers. Not just IIB/ACE, this affects all AIX hosts. We've had support enable a debugUI flag which changes the OS data polling interval from 10 secs to once per minute. This has taken the CPU usage down to about 4 % which is feasible.

Babar_Qayyum · ‎03 May 2021

Hello @kalle_lahtinen

Thank you for sharing your experience.

There was a recent fix provided by IBM which helped both the broker startup time and the memory consumption per DataFlowEngine. For which version of IIB/ACE?

Would that make sense also for you? It will be nothing but one of the tedious jobs to go for the manual on more than 100+ processes and then regular maintenance.

The AIX osagent takes about 8-9 % CPU on most servers. This is not acceptable in any case.

We've had support enable a debugUI flag which changes the OS data polling interval from 10 secs to once per minute. This has taken the CPU usage down to about 4 % which is feasible. Can we enable this debug flag by ourselves?

Regards,

Babar

kalle_lahtinen · ‎03 May 2021

Hi,

The ACE version we're running is 11.0.0.12.

About the tediousness of the manual injection -> Very much agree, but on the other hand, if it resolves these issues around the performance of IIB/ACE, I'd at least personally think of it as an acceptable workaround.

Those debugUI settings can only be configured by support, even though it's "our" Managed cluster so to speak 🙂 So if you're having similar issues regarding CPU consumption on AIX, you need to create a support ticket about it.

Babar_Qayyum · ‎03 May 2021

Hello @kalle_lahtinen

I got the debug flag to use as an environment variable.

Regards,

Babar

kalle_lahtinen · ‎03 May 2021

Cool! That seems like a nicer way to manage this, instead of having to always ask from support to change the config...

Babar_Qayyum · ‎03 May 2021

Hello @kalle_lahtinen

Correct! Here is the debug environment variable. It will automatically change the polling from 10 seconds to 1 minute.

DT_DEBUGFLAGS=decreaseUpdateFrequency=true

Regards,

Babar