09 May 2024 08:36 PM
This is something that is happening in a production environment, that seems very rare, but has some interesting observations, so posting it here, as someone might have had a similar experience before?
It starts quite easily, in an environment where EF1 extensions stopped working. After an initial check, we found out that the extensions directory was not there, at /opt/dynatrace/remotepluginmodule
I know from experience that if deploying an EF1 extension, and it has dependencies, if the extension deployment directory is deleted, it immediately stops working. So, I thought that if the whole directory disappears, it makes sense for the EF1 extensions to stop working.
Since the ActiveGate was working normally, I forgot to detect that the whole /opt/dynatrace/gateway directory was also missing!
So, this means that an ActiveGate, or at least the routing functionality, seems to keep on working when the underlying files are not in the filesystem anymore! This is normal in Linux, but quite an achievement, as in several cases I have seen before, the applications don't manage to survive for so long. But this has been going on for several days...
There are several interesting forensic issues around this case, and everything suggests that the directories disappearance has nothing to do with Dynatrace. But, has anyone experienced anything like this before?
12 May 2024 01:33 PM
Hi @AntonioSousa,
Did you find anything in the logs, or was the directory also removed? Also, did you check the command history for anything suspicious?
12 May 2024 08:09 PM
Since the logs are in a separate directory, we were able to determine within 20 seconds when the directory /opt/dynatrace/remotepluginmodule disappeared. Strange thing is that the directory /opt/dynatrace attributes don't show any change in that period. So, it might have been some low level stuff, possibilities including things like filesystem corruption. But, since this happened in several AGs, and there are some other security solutions, I wouldn't exclude other options.
Our main objective was bringing the AGs back to a stable environment. It was impressive that they kept on going, but the first controlled restart revealed that they were not even stopping by themselves. So they had to go the hard way. Re installment was quick as usual, so everything seems to have gotten resolved. Forensics will continue...
13 May 2024 11:33 AM
What about puppet or similar tools "removing" undesired software?
13 May 2024 11:43 AM
In this case there is no puppet, but there are other tools. But, we haven't been able to pinpoint anything yet...
13 May 2024 03:59 PM
Usual Suspects: AV, orchestration tools, configuration management tools
13 May 2024 08:20 AM
Most likely something or someone accidentally deleted it. Actually the main ActiveGate process will keep running mostly without any issues as it's already loaded and it does not need to read anything from the disk. The AG is in fact "stateless" and there are just a few things it reads from the config directory which is located in another path.