Re: Windows cluster service monitoring

Deep · ‎27 Feb 2026

I currently manage a Windows Active-Passive Failover Cluster and require a monitoring solution for our cluster services. Specifically, I need to trigger alerts only when a service fails on the active node.

Currently, we use Dynatrace OneAgent’s default OS service monitoring; however, this is generating false positives because it flags services on the passive node, which are intended to remain in a stopped state. How can I configure Dynatrace to ensure alerts are only generated for the active cluster node? I would appreciate your guidance on best practices for this setup.

mark_bley · ‎01 Mar 2026

You can define the os service rule at host level.

But in order to handle the config better with config as code you could modify your current rule to have a specific filter and watch for a property that only your active nodes will have, you can set such property as follows:

https://docs.dynatrace.com/docs/shortlink/linux-custom-installation#custom-host-metadata

Deep · ‎02 Mar 2026

Thanks for the clarification, Mark. However, I believe that approach applies more to static tagging. In a cluster environment, active and passive node statuses change dynamically; that is the fundamental nature of how a cluster operates. Consequently, the cluster itself is the only source that maintains real-time information on which node is currently active.

Currently , we are getting info from windows event log for node on which is active node... so if wondering if something can be done with logs and service status metric...or some other method.

mark_bley · ‎02 Mar 2026

If you are on SaaS:

Create a log event that is send out when quorum elects new active node (the log entry you are talking about)
Event will trigger workflow that removes os service detection rule on old node and applies it on new one

If you are on managed same logic would apply, but since you do not have workflows a custom webhook/automation or extension would do the trick.

Deep · ‎02 Mar 2026

That is what we are doing, but wondering anything we can do in Dynatrace using DQL as we have all required data. What i was thinking , disable alert for the cluster monitoring (to avoid problem creation), then create custom anomaly detection rule, that will check dt.osservicea.vailability metric on the active nodes (that can be identified from logs) and create event when service status is in stopped state. To avoid cost for logs , we can convert logs into metric, so that when querying active node from metric data (logs converted into metric) is free of cost.