Hi. We have an incident configured for hung threads on an agent group. There are 4 agents in this group. The hung thread measure has "1" in the severe threshold. The incident is configured for average over the last 5 minutes.
The expected behavior is that the incident wouldn't end since the actual Hung Threads for each JVM don't go below one until the JVMs are restarted. However, the incident is ending regularly, which seems impossible since the hung thread count can't go down until a JVM is restarted
Any ideas as to why this is happening? Thanks.
Are you using the WebSphere measure? This declaration isn't necessarily an indicator of a deadlock, and so the thread may actually resume eventually, causing the counter to decrement again. Can you add a thread dump to the actions of the incident to confirm?
In your case, get advise from Application Server experts for threshold values for hung thread. We currently have defined warning threshold to 100 and severe to 200 (with max active thread = 700).