Information:

Environment

  • Dynatrace Portal
  • Alerting

 

Symptoms

I would like to know how the test alert is triggered.

Solution

Each time a Backbone test runs on a node, it compares the result to the alert settings. If that result has exceeded the set threshold (being a failure or high response time) that node goes into an alert status. When that change of status occurs, the node threshold is checked. If at that time, the number of nodes in alert status for that test has met the node threshold, the alert is triggered.

If, when the test runs, the node was already in an alert status and the returned data no longer exceeds the threshold, the status will change back to 'good'. Again, when that alert status changes, the node threshold is checked. If the node threshold is no longer being met, the Condition Improved alert is sent.

For example, a test running from 5 nodes has been configured with Transaction Failure alerting with a node threshold of 3. When the test starts running presumably it is not failing so, the status of each of the nodes is GOOD.

Node 1: GOOD
Node 2: GOOD
Node 3: GOOD
Node 4: GOOD
Node 5: GOOD

As a test run is completed, its status is reported back to the main data center. As the test begins to fail, the status is updated and compared to the alert thresholds.

Node 1: GOOD
Node 2: FAILED
Node 3: GOOD
Node 4: GOOD
Node 5: FAILED

In the example above, 2 nodes are in a FAILED status. When the status of Node 2 and 5 changed from GOOD to FAILED, the threshold and current number in FAILED are checked. Since in this case the threshold of 3 nodes has not yet been met, no alert is sent. If a third node fails at the same time (as in the example below), an Alert would be sent.

Node 1: GOOD
Node 2: FAILED
Node 3: FAILED
Node 4: GOOD
Node 5: FAILED

If a fifth node fails, an 'All Sites' alert would be triggered as well. It is important to be noted, however, that the overall alert status is determined by total number of nodes in a failure status at one time. If two nodes fail, and then those nodes return GOOD data prior to the next node failing, an alert will not be sent.

Root Cause

 

Icon