Hi dynaTrace community,
I'm currenlty working on diagnosing file handler issue on AIX OS.
Objective: Monitor how java uses file descriptors for specific java process.
Issue: Application crashes when the max threshold for number of open files reaches the max file descriptor setting (ulimit=2000 for IBM WAS).
The symptom is true indication of the file handle leak issue that's been discussed in the forum. Although I was able to successfully configure the "handle count" measure on a chart dashboard, however no available data is produced to observe the handle size behavior for either specific java process level or host level.
Question: Does this measure require a root access instrumentation to capture the count for specific process? Reason asked, I've noticed I'm able to see the file report when running lsof command, however only aix admin has the ability to produce the report for specific process when running (lsof -p [PID] -r [interval in seconds, 1800 for 30 minutes]) command.
Has anyone came across this issue? Any suggestion/recommendation is greatly appreciated.
Thank You in advance.
Follow Up: I did engage the dt support team and doesn't seem like there is an out of the box measure that could be enabled for AIX OS.
However. below is recommendation for a workaround by the support team member (@Jason Yi):
Thank you for contacting support team. so far, I have suggestion for your question:
1) Did you try to contact the IBM support to see what is the JMX measure for the WAS on the AIX machine to monitor the file number opened by WAS? our Java agent doesn't create JMX measure, we just pull the data from JMX.
2) do you think the plugin is a workaround?
The Appmon allow the user to create custom plugin. I think we can use the plugin to run the shell script on the AIX machine to get the opened file number at OS level.
However, the creating custom plugin is out of the support working scope. if you need help to create the user plugin, you can involve our engagement team to help you.