I've got a log file that every so often writes the duration of a script down:
[Friday, May 7, 2021 5:31:07 PM] Run started...
[Friday, May 7, 2021 6:32:28 PM] Run completed
[Friday, May 7, 2021 6:32:28 PM] Total runtime: 61.21 Minutes
[Friday, May 7, 2021 6:36:07 PM] Run started...
My metric captures the 61.21 as a Number called RunMinutes and that seems to work great for charting.
This log exists for many process groups.
What I want to do is monitor for RunMinutes greater than 45 and trigger a Slowdown alert, and if the log hasn't been written to in an hour, trigger an (error, availability?) Alert.
From the Create Custom Event for Alerting, when I select my RunMinutes Metric, the option by "Alert anomalies with a static threshold of" only has "Count" in the dropdown. If I put 45 in there, am I telling it 45 new instances of RunMinutes, instead of a RunMinutes value of greater than 45?
And on the next bit, if I set the observation period to 60 minutes - where we can Alert if the data is missing for 60 minutes - if one of my logs that gives me RunMinutes is hung up, but the others are still putting in RunMinutes - while I not see an alarm, do I need to do I need to create a single custom event for each log to get an alert when one's hung up?