Looking for feedback on how everyone handles process group low instance count type problems for hosts that go into a maintenance window. We use BigFix as our patch deployment tool which will reboot our hosts after patches install. We also have BigFix put the host into a maintenance window before the host gets rebooted, using a simple tag of the hostname with the value being the server name, to capture that host. This works great to stop the process unavailable type alerts because we have an auto tag rule that pushes down the hostname key to the process level, meaning that the process group instance is captured in that maintenance window.
The problem we are facing is with the process group low instance count type problems. Process groups for us do not have a hostname tag because it could contain any # of hosts so these type problems still alert when the server is rebooted when ideally it should not.
How is everyone handling this? We need some way of doing this at scale and for us using the hostgroup value would not work because we have thousand+ hosts that have a generic host group (ones where app teams don't care to use Dynatrace for extra monitoring but we still have the agent on them for basic infrastructure monitoring). Using a hostgroup tag for us would be problematic because we wouldn't want to put all hosts from that generic hostgroup into a maintenance window because only a subset at a time would be rebooted.
I know we can do fancy things with first running API calls to get the process groups on that host that are set for 'open a new problem if minimum threshold is not met' then put that process group ID into the maintenance window, but I'm also just looking for other input here.
Can you share the config for the maintenance window? By default, maintenance windows will only impact availability monitoring for the entities you target. If you are applying the maintenance window to hosts, for instance, the maintenance window will not effect monitoring on the dependent process groups. You would need to add process groups to the maintenance window to make that happen. Let's take a look at this first to see if it solves our issue. You're welcome to share a screenshot with blurred PII here or reach out for a coffee talk.