can you advise me how can I determine how much AMD capacity I need for the DC RUM deployment? Is there for example some calculator or something like that?
We have the DC RUM deployment and on the AMDs we have 800 GB but partition /var/spool/adlex is still growing up and I think 800 GB will not be enough. Which data are saved to /var/spool/adlex?
Solved! Go to Solution.
Have you checked if your AMD were generating core dumps?
How much traffic are you analyzing on this probe?
We have about 300Go hard drives on our probes for the whole machine and they are far from being full!
Jakub - don't worry.
There are couple of things to have in mind and as Sandrine says, you should have enough.
The reason is that there is support process that keeps a minimum of space free, then it starts to clean up oldest data. I was looking for an older post (can't find it) about the process but I "Think" it aims to keep ~20% free disk.
The datacleaner.config file Adam mentioned above, contains this value, 20% is correct (at least for the AMD I just checked).
It'll try to keep 8 days (192 hours in the config file), but will purge older data to keep 20% free space.
Answering your questions:
As Sandrine mentioned it good idea to check /usr/adlex/rtm/bin and /usr/adlex/cba/bin folders for core.xxx files. Make sure /usr/adlex/config/rtm.config is set up to keep only one core file, otherwise it may flood the HD and start eating data files. It's because of the process mentioned by Ulf called datacleaner configured in /usr/adlex/config/datacleaner.config file - by default it makes sure the HD has 20% of free space by deleting the oldest data files only from /var/spool/adlex/rtm.
If your AMD configuration produces big data files make sure you need this config and/or if you need default 8 days of data files to be stored on the AMD HD. As they're meant to be read constantly by report servers you should keep then as long as AMD <-> CAS/ADS link might be down ...
Recently the same issue started with one of our AMD and its started
eating HD rapidly.
I had gone through with all your mentioned steps to find out the
root cause of this behavior and I found that each AMD primary data source has 3
core files in /usr/adlex/cba/bin and one core file in /usr/adlex/rtm/bin in all
AMDs primary and secondary data sources.
Our current scenario is that we
have 4 AMDs. Two AMDs are primary data sources and 2 are secondary data sources
to our CAS/ADS server.
One AMD pair is working fine but the other is going to finish HD
sooner. We are feeding almost the same volume of traffic to both AMDs.
to me resolve this issue before going into big trouble.
Will set to retain only the most recent RTM core
the <maxCoreDumps> parameter will control the number of CBA cores
today I checked both folders which mentioned Adam for core.xxx files. I do not see any core.xxx file in those folders. Screen of the content of /usr/adlex/rtm/bin folder is attached.
Now we have only 10% free space of overall 800 GB disk space.
Please share the output of the following two commands:
du -h --max-depth=1 /var/spool/adlex/ | sort -n -r
ls -halt /var/spool/adlex/rtm/ | head -n20
as it looks like you just have big data files.
the outputs are here:
pprumamd@prumas02:~$ du -h --max-depth=1 /var/spool/adlex/ | sort -n -r
du: cannot read directory `/var/spool/adlex/avagt': Permission denied
pprumamd@prumas02:~$ ls -halt /var/spool/adlex/rtm/ | head -n20
-rw-r----- 1 pprumamd pprumamd 4.7M Dec 4 11:17 hpdata_
-rw-r----- 1 pprumamd pprumamd 5.5K Dec 4 11:17 cbastatsdata_5661681c_1_t
-rw-r----- 1 pprumamd pprumamd 12M Dec 4 11:17 vdata_
-rw-r----- 1 pprumamd pprumamd 1.9M Dec 4 11:17 dtdata-102977743_
-rw-r----- 1 pprumamd pprumamd 792 Dec 4 11:17 systemstatsdata_5661681c_1_t
drwxr-x--- 2 pprumamd pprumamd 45M Dec 4 11:17 .
-rw-r----- 1 pprumamd pprumamd 25K Dec 4 11:17 amdstatsdata_566167e0_1_t
-rw-r----- 1 pprumamd pprumamd 49K Dec 4 11:17 gatestatsdata_5661681c_1_t
-rw-r----- 1 pprumamd pprumamd 573 Dec 4 11:16 transdata_566167a4_5_t
-rw-r----- 1 pprumamd pprumamd 210 Dec 4 11:16 page2transmap_566167a4_5_t
-rw-r----- 1 pprumamd pprumamd 4.3K Dec 4 11:16 cpdata_566167a4_5_t
-rw-r--r-- 1 root root 1.3K Dec 4 11:16 cbaagentstatsdata_566167e0_1_t
-rw-r----- 1 pprumamd pprumamd 161K Dec 4 11:16 rtmstatsdata_566167e0_1_t
-rw-r----- 1 pprumamd pprumamd 5.5K Dec 4 11:16 cbastatsdata_566167e0_1_t
-rw-r----- 1 pprumamd pprumamd 792 Dec 4 11:16 systemstatsdata_566167e0_1_t
-rw-r----- 1 pprumamd pprumamd 25K Dec 4 11:16 amdstatsdata_566167a4_1_t
-rw-r----- 1 pprumamd pprumamd 49K Dec 4 11:16 gatestatsdata_566167e0_1_t
drwx------ 9 pprumamd pprumamd 4.0K Dec 4 11:15 ..
-rw-r----- 1 pprumamd pprumamd 161K Dec 4 11:15 rtmstatsdata_566167a4_1_t
the issue is resolved!
During the installation of DC RUM, customer had specific requirement to install and run the DC RUM under application user "pprumamd" who has sudo privileges to some commands only. We also cannot run crons under root but under our app user. However, we did not know that in adlexcron_run15min "nice" command is used and app user has not privileges for "nice".
Our resolution is to remove "nice" command and run the datacleaner without it.