I am trying to create an incident base on measure memory pool utilization (Java Virtual Machine) where utilization is greater than 75%, but less than 90%. This uses Java Virtual Machine Measure Specific Attributes of Sun/Oracle and Perm Gen. I created 2 measures where upper severe = 75 and Lower Severe = 90 and put them in a custom incident using avg aggregation and the and logic operator, but i am getting inconsistent. Attached are the involved measures from one successfully generated incident, not sure why 4 display when i am using 2. I appreciate any feedback, Thank You.
Whats the evaluation timeframe? Is it 10s? Try to increase that to the next possible timeframe - i think its 1m. and try it again.
Also - try to chart these metrics and see which values are really coming back
I have tried a number of evaluation timeframes, the latest from last week was at 1 minute. Below left is the chart generated from the incident itself using the custom measures, and the right is the OOB measure with no threshold set. I received 2 indcidents and expected 5. In my test environment, I configured anything greater than 18 (>75% measure) and less then 29 (<90% measure) to return values as seen in testing, see image measures. I recieved the yellow items back in an alert but expected the others in green as well. Your help is much appreciated.
The Average that you see in the table view of the Chart is the Average of the timeframe of the Dashboard, e.g: Average of 30 Minutes. Not saying that this might not be a bug on our side - but - just wanted to clarify that the value you see in the table in the bottom reflects a different timeframe than what you have configured in your incident.
So - if the memory is fluctuating a lot it is possible that the 1 min average is not within 18 and 29 but the 30min average is. Looking at the screenshot though looks like these numbers are pretty consistent and not fluctuating that much. If that is the case you may want to open a support ticket to have an engineer look at this
Thank you Andreas. My understanding is that the averages are based on the chart resolution time. The chart on the left was generated directly from the incidents and shows threshold automatically. It appears to show from 0 to 29%. Please let me know if you have anything additional i should try. I will open a support ticket as you suggest.
You can see the actual resolution when you move your mouse to the top right of the chart dashlet. There is a toolbar where the second option is the resolution. Click on it and you will see which aggregation is currently used.