I am getting the below error five times in a day with 5 mins gap .
Can I ignore the alert as it is triggered for some particular time for 4 to 5 times only. Or I have to act on it. Attached incident screen shot from self monitoring .
|The Host has had insufficient memory available or considerable amount of page faults for a sustained period of time.|
|Alert Name:||Self Monitoring|
|Duration:||2017-09-26 09:36:02 - 2017-09-26 09:44:02 [8min]|
|The memory health of host 'xxx' is not ok. During the last 15 minutes this host reported 20 (or more) hard page faults per second||:|
Every environment is different, but are you seeing some correlation to performance degradation when these alerts fire off? Or is everything operating as normal during this time? If there is no correlation or other impact when these messages fire off (no increase in response time for example) you can increase the threshold slightly on this host so your not getting false flags with all these alerts. You can edit the thresholds in the infrastructure settings found here. Hope this helps.
I would like to add couple of more checks what @Nathan M. already explained that just correlate the Network and System drive utilization because of page faults incidents. May be some other tasks are running on that time or some kind of a backup (incremental/deferential) is taking on that specific time.