Open Q&A
If there's no good subforum for your question - ask it here!
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Windows cluster service monitoring

Deep
Newcomer
I currently manage a Windows Active-Passive Failover Cluster and require a monitoring solution for our cluster services. Specifically, I need to trigger alerts only when a service fails on the active node.
Currently, we use Dynatrace OneAgent’s default OS service monitoring; however, this is generating false positives because it flags services on the passive node, which are intended to remain in a stopped state. How can I configure Dynatrace to ensure alerts are only generated for the active cluster node? I would appreciate your guidance on best practices for this setup.
4 REPLIES 4

mark_bley
Dynatrace Champion
Dynatrace Champion

You can define the os service rule at host level.

But in order to handle the config better with config as code you could modify your current rule to have a specific filter and watch for a property that only your active nodes will have, you can set such property as follows:

https://docs.dynatrace.com/docs/shortlink/linux-custom-installation#custom-host-metadata

Screenshot 2026-03-01 at 21.30.33.png

Deep
Newcomer

Thanks for the clarification, Mark. However, I believe that approach applies more to static tagging. In a cluster environment, active and passive node statuses change dynamically; that is the fundamental nature of how a cluster operates. Consequently, the cluster itself is the only source that maintains real-time information on which node is currently active.  

Currently , we are getting info from windows event log for node on which is active node... so if wondering if something can be done with logs and service status metric...or some other method.   

mark_bley
Dynatrace Champion
Dynatrace Champion

If you are on SaaS:

  1. Create a log event that is send out when quorum elects new active node (the log entry you are talking about)
  2. Event will trigger workflow that removes os service detection rule on old node and applies it on new one

If you are on managed same logic would apply, but since you do not have workflows a custom webhook/automation or extension would do the trick.

Deep
Newcomer

That is what we are doing, but wondering anything we can do in Dynatrace using DQL as we have all required data. What i was thinking , disable alert for the cluster monitoring (to avoid problem creation), then create custom anomaly detection rule, that will check dt.osservicea.vailability metric on the active nodes (that can be identified from logs) and create event when service status is in stopped state.  To avoid cost for logs , we can convert logs into metric, so that when querying active node from metric data (logs converted into metric) is free of cost.

Featured Posts