Page Faults as indicated by the AppMon Host health dashboard are always a tricky thing to eye-ball. I found that for some servers 0 to little is normal, whereas for other server a higher amount is perfectly acceptable and the right thing to do is adjust the threshold for the OOTB memory incident.
However, I'm trying to find a 'proper' way of diagnosing whether page faults indicate a genuine memory issue (swapping, using the disk to compensate for lack of memory and resulting in hard page faults) rather than just 'soft' page faults which aren't as bad.
IHAC who has 2 misbehaving servers which run into a pattern of page faults throughout the whole day. Even though there is a clear seesaw pattern for the memory usage, the page faults seem on a steady growth. Peak traffic times for this app are 7AM - 10AM and 5PM - 8PM. Both of these servers are Windows boxes hosted in VMware, so there might be a chance of some hidden resource consumption at the ESX host level playing a part into this? (would hidden over-subscription of memory show as a need for swapping even when available memory seems sufficient in AppMon?).
What would be a recommended course of investigation to get some concrete evidence on memory issues?
I'm attaching some snaps over time. Any discussion is much appreciated.
As per our official docs, it seems the number seen in Page Faults is indeed only Hard Page faults. However, reading more on the topic, it seems that any program which doesn't load entire files into memory but rather memory-map the file to deliver it bit by bit, will end up creating hard page faults which is not indicative of a memory issue.