I have a windows services running and dyntrace able to capture purepath activity. When there is an error happen to my windows services, the services will stop running but the status still show "Running" in Services.msc. Everytime this happen, I have to restart my services to get it running again.
I'm trying to setup an incident rules in Dynatrace so that it will trigger me with email when this happens. How should I configure the incident rules for this scenarios?
As attached, I notice the "Size" shows "1" after the error happen.
You could create an incident for PurePath size in the context of your transaction so that it alerts if it goes to only 1 node, but would be interesting to see what that one node is in those purepaths. Can you share a screenshot?
You can read our documentation on setting up incidents here.
Without knowing more about your app/service/transaction, I can only recommend an incident which uses 2 metrics: the invocation of the CheckStatus method and the PurePath Node Count. Set the upper severe threshold for the invocation measure to 1 and set the lower severe threshold for the node count measure also to 1 (so you have at least 1 invocation of CheckStatus and the PurePath is 1 node long).
This should alert you if your PurePaths aren't progressing due to service unavailability.
Alternatively, if you have a web based check for your service (e.g. a URL based endpoint which you can poll for status), you can set up a URL Monitor; also if you have a script which can run a command to get the status of the service, you can use the Generic Execution Plugin to run that command periodically and report on the status (then you can use that in your incident).
Apologies for this, I realised a flaw in the setup I suggested - these measures will work independently. So as the invocation of checkStatus and PurePath size will be checked independently and not on the same PurePath.
What you need to do is create a Business Transaction, which has these 2 measures in the filter section. Then you will be focused on just the checkStatus purepaths with 1 node. Then, in your incident you will use the PurePath Response Time measure created from this Business Transaction with the Count aggregation and the upper severe threshold set to 1 (so if the count of these 1 node purepaths goes over 1, it will trigger). I would recommend an incident timeframe of no longer than 1 minute as you want this to be a fairly sensitive incident and be alerted immediately as this happens.
Let me know how this works.