Solved: New Baselines Changes

emily · ‎07 Aug 2020

Hello – I’m trying to better understand the new baselining feature that was described in this blog post:https://www.dynatrace.com/news/blog/dynatrace-innovates-again-with-the-release-of-topology-driven-au...

I was very excited when I saw this, however I’m having difficulty understanding the details now that I’m looking at it in the tool. Some questions:

Are the thresholds being adjusted automatically on a daily basis? Hourly? By this I mean the duration for which a specific threshold would be constant
Why do the baseline lines in the preview appear at different levels on the 12 hour, 1 day, and 7 day tab?
What is the real way to understand the aggregation – for instance if I have 5 containers and I’m monitoring CPU% across them with aggregation set to “average” what is really happening? Is the threshold set so that if one container breaches the average CPU% of the 5 containers it triggers a Problem?
It seems like the baseline shown in the preview is not the same as the baseline indicated in Problems that are triggered from the alert – is there a reason this would be the case?

Thanks for any info.

wolfgang_beer · ‎07 Aug 2020

Hi Emily,

I think some of your questions are answered in my help page here:

https://www.dynatrace.com/support/help/how-to-use-dynatrace/problem-detection-and-analysis/problem-d...

In detail:

- Auto-adaptive baselines are updated once a day.

- No, appears on the same level, as it shows the current state of the baseline using last 7 days of historic data to calculate the baseline. So it is a moment snapshot of the calculated baseline.

- In current state we do not aggregate over entities. The baseline is calculated once a day for every entity that is within the entity filter. So if you select CPU usage metric for 5 hosts, you get 5 baselines updated every day and possibly 5 alerts if baseline is breached.

Our plan is to introduce baselines on aggregates with end of this year as well, so that you have both options.

- Yes the alert always uses the last updated baseline value for each entity. So again, for 5 hosts you get 5 baselines, one for each host depending on the metric level. You can easily check that by creating a baseline for one host or one service alone, which should be exactly the value shown in the config screen.

Best greetings,

Wolfgang

emily · ‎10 Aug 2020

Hi Wolfgang,

Thank you for your response!

Can you please confirm my new understanding:

- "Auto-adaptive baselines are updated once a day." So this means the baseline is at a single value (e.g. 1ms) for 24 hours and then could shift to a new value (e.g. 9ms) for the following 24 hours based on subsequent data? Also when during the day does this change happen? (e.g. Midnight)

- "In current state we do not aggregate over entities." So the current selection for the aggregation picklist has no impact in the current release?

Thanks Again!

Emily

wolfgang_beer · ‎10 Aug 2020

- Yes exactly, this current model adapts once a day, based on 7 days of historic data.

- The current selection of aggregate refers to the 'subminute' aggregate that is used per single line. As e.g.: host cpu metric is collected 6 times a minute for each host, with this selection you can control which line you would like to baseline, the line of the avg of those 6 measurements per minute, the max, min, or count (makes not much sense here).

emily · ‎10 Aug 2020

Hi Wolfgang,

One last question, promise!

Does the same apply to those baselines that were before, like errors, load and response time? Are those also adjusted once every 24hrs?

Thanks again!

Emily

wolfgang_beer · ‎11 Aug 2020

No problem at all, happy to answer all your questions.

Yes same applies to all built in service and application baselines that already were in place.

DanteP · ‎05 Sep 2020

Hi @Wolfgang B.

Just a quick question. Does this new baseline work OOTB for infrastructure metrics? Or a Custom Alert is required? Does, for example, CPU "Automatic" Anomaly is still hardcoded at 95% ? or will now work with a baseline? Is the same with all others Resource events?

Thanks!

wolfgang_beer · ‎07 Sep 2020

Existing settings are unchanged. By the way, CPU as well as Memory are saturation events, I would not set those to baselining, as it would not deliver the same semantics. For a CPU it might be a bit harsh to alert if a learned baseline of 30% is breached with e.g.: 35%. Here i would really leave the threshold based alerting in place.