cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

METRIC_ALARM baseline threshold alarm

According to the documentation...

Baseline threshold
A baseline threshold can be defined only if the value BREAK is specified for one of the following combinations of aggregation flags:


  • Aggregate reporting group,
  • Aggregate reporting group and Aggregate URL,

  • Aggregate URL,

  • Aggregate reporting group and Aggregate server IP address,

  • Aggregate server IP address,

  • Aggregate site.

Baseline threshold is specified in one of following ways:
operator baseline_multiplier
operator baseline_multiplier + threshold

For the single triggering mode, the current value of the metric being monitored is compared with the baseline value multiplied by the baseline multiplier.

If the absolute threshold value is also specified, the value of the metric is also compared against the specified absolute value.


For the absolute or relative triggering modes, the calculated absolute or relative value of metric increment is compared to the specified percentage value.

Replace operator with one of the following operators:

  • “<”: To raise the alarm if the metric value is below the baseline threshold
  • “>”: To raise the alarm if the metric value exceeds the baseline threshold
  • “<=”: To raise the alarm if the metric value is less than or equal to the baseline theshold

  • “>=”: To raise the alarm if the metric value is greater than or equal the baseline threshold

So, I have a metric alarm with Aggregate URL set to BREAK, so a Baseline Threshold should be possible, URL field contains a set of specific URLs I am interested in SS field contains the SS these URLs exist in.

Metric is TRANS

Baseline threshold is set to <0.5 which suggests to me that this should trigger for each URL when the number of operations recorded for that URL is less than 50% of the baseline number of operations for that URL.

What I'm actually getting are alerts being triggered at inappropriate times (appears to be every period)

An example alert from log file is (had to redact some info here):

Wed Sep 23 13:15:00 BST 2015|PGS_URL_ACT_0001_MA|Alarm started|Anomalous number of pages for <Service> operations. (< 50% baseline) [*, <URL>, *, *, -, 578, <664.125, 578, <Service>, latest sample/baseline, -, -, -, , , , ]|1443010500000|*|<URL>|*|*|-|578|<664.125|578|<Service>|latest sample/baseline|-|-|-||||

So the number of operations for this URL at this time is shown as 578 and the threshold it was compared against was 664.125

If I look at the data for this URL, the number of operations is 578 and the baseline value is 662. The alarm should therefore be comparing 578 and 331, not 664.125 (where does 664.125 come from?)

Other URLs also trigger with the correct value for TRANS but an incorrect (and different) value for the baseline*0.5 threshold.

Have I misunderstood how this should work, or is this a known issue?

Yes, Version 11.5, I know, unsupported.

Thanks

2 REPLIES 2

adam_litwin
Dynatrace Participant
Dynatrace Participant

This types of problems require support team investigation. I'm afraid you will not find any help for v11.5 (unsupported).

Erik_Soderquist
Dynatrace Pro
Dynatrace Pro

Based on the information presented, I would say the most likely scenario is some difference exists between the what the alert is looking at and what the report is looking at. I find it most accurate to create a new DMI report that has *only* the information that the alert is looking at, as additional dimensions/metric can change the results of the report, and therefore give an inaccurate representation when trying to troubleshoot the alert.

-- Erik