I have a monitored service doing some batch jobs. Each job is a separate purepath, and sometimes (maybe once per week), a job would fail. We collect the status of the jobs using request attributes (success or failed).
I've set up a new metric representing the request count of transactions with the status request attribute value failed and a custom event for alerting. The problem is, with only one failure in days, the static threshold settings can't pick it up.
Is there any alternative approach I might take to get an alert when a job fails?
Solved! Go to Solution.
if you find the request, you can mark it as a key request and you will have more granularity to alert on this specific transaction/request
Same issue here that's why I raised an RFE for Count base problem instead of rate base as most of the code side service which is not a actual uri does not have that much rate which can satisfy current alerting mechanism. link
Thanks, voted for that one. Will promote to get more votes.
The solution for me was to set the custom event static threshold to count > 0, and static thresholds to their minimums (for 1 minute during 3 minutes period).
This returns a value of 0.33 (>0) and results in a custom alert.
What confused me yesterday is that the Alert preview didn't return any results when I've been setting it up.
I believe this could also be applied to your use case @Aamir K.?
Thanks everyone for your input on this!