07 Dec 2021 07:21 PM - last edited on 30 Mar 2022 08:35 AM by MaciejNeumann
I would like to create an alert for when a process has crashed.
Has anyone done something similar?
Solved! Go to Solution.
07 Dec 2021 07:55 PM
Does Process group availability monitoring and alerting work?
08 Dec 2021 01:47 AM
Yes it does, granted you need to have the following set:
And then the associated Alert Profile will trigger then trigger the alert integration and send out an alert as you have set up.
13 Dec 2022 08:23 AM
In our use-case we have Processes which are not running all the time, to it's normal that they become "unavailable" by design.
On the Processes->RandomProcess.exe Page there is this beautiful "Events" Graph which would be nice if we could use them for custom metrics. Or from the Graph in the "Application & Microservices" - "Profiling and optimization" - "Crashes" section.
29 Mar 2022 05:13 PM
Doesn't help in a scenario where there are multiple worker processes and we want to alert when any of the workers crash. We can see it in DT, availability does not help here. It would be nice to separate availability events caused by crashes anyway.
30 Mar 2022 10:04 AM - edited 30 Mar 2022 10:06 AM
Hello,
This can be achieved with Event API, via a developed script to make an Event API call periodically and limited it to only crash events. In my environment I am monitoring a large scale of servers with more than 600 applications, most of the crashed events actually come from unimportant processes which only executed at ad hoc basis. To filter out the noise, I will proceed further to verify the crashed process uptime using the time-series CPU usage metric to filter out the false alarm.
Please refer to the attached picture below to understand how it works, hope it helps.
29 Feb 2024 12:46 PM
Would be nice if you could alert on the crash events, as someone here posted. Anyone been able to solve this?
29 Feb 2024 02:28 PM
Can you share the event as it shows up for your crash. You can create a custom Event to trigger off an alert when a crash is reported.
We did something similar for Pods that went into a 'Killing' State.