cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Why infrastructure baselines are defined only by static thresholds?

ksaito
Organizer

Hi,

In Dynatrace, response time/error rates/loadings of services are monitored by dynamic and automatic baselines.

On the other hand, however, infrastructure monitoring is on the basis of not those baselines but pre-defined or user-defined static thresholds.

I wonder why Dynatrace doesn't use automatic baselines for infrastructure monitoring.

Are there any reasons for this?

Kohei


11 REPLIES 11

According to my experience infrastracture metrics by default are using baselines and static thresholds as well... we’ve got reports from Dynatrace not only when static threshold were used but when change from pattern was to big as well. Why do you assume that there are only static thresholds?

Sebastian


ksaito
Organizer

Hi @sebastian k.

Oh, is that so?

The reason why I consider that infrastructure monitoring uses only static thresholds is the following URL.

https://www.dynatrace.com/support/help/shortlink/p...

This page seems to say that automated baseline can be used only for applications or services monitoring and infrastructure metrics aren't monitored with baselines.

Maybe, am I misunderstanding this description...?


Hmm I'm not sure but generaly situation like high CPU or low disk space is quite static. 95% of CPU usage isnt good the same like 2% of free storage. So such thresholds may be static. Maybe I've got notifications that were matching configured static thresholds and I've taken them as baselines violations. It's interesting


Hi Kohei,

Your understanding is exactly my understanding as well:

For service and application:

- 'Automatic' means baseline is used, we can configured how far off we are, from the baseline, before an alert is raised.

- 'Static' means use static threshold/SLA albeit baseline has been generated

For Infra

- No baseline is ever generated. Everything is static threshold, 'automatic' means use default threshold of the tool

- 'Static' means you use your own threshold.

So seems like the word 'automatic' carries different meaning (huge different) in the context of service and application, vs in the context of infra.

But again, this is my observation, I might be wrong, let's wait for some Dynatrace staff to chirp in.


Hi Wai,

thanks for your additional explanation!

Yes, what you mean is as well as my understanding.

'Automatic' in infrastructure thresholds means just like 'default thresholds Dynatrace suggests'

By the way, that reminds me that I posted a question about the meaning of 'Automatic' in the point of view of infrastructure monitoring.

The post is the following:

https://answers.dynatrace.com/questions/208354/what-does-automatically-in-the-view-of-anomaly-det.html


wolfgang_beer
Dynatrace Champion
Dynatrace Champion

Within Infrastructure metrics you can go with the 'automatic' mode as well but that does mean that Dynatrace decides on a good default threshold. You can overwrite the automatic mode by setting your own static thresholds.

You are right that we do automatically baseline all key performance metrics, error rate and traffic in a dimensional baseline cube (time to first byte, speed index, response time, visually complete, DOM interactive) as well as the service response time and error rates.

Within infrastructure metrics the automatic mode in many cases means to use the best practice standards that the individual vendors propose, such as thresholds introduced by AWS, VMWare, etc.


thx for explanation 🙂


@Wolfgang B. Can we export the automatic configured threshold values(individual vendor proposed) for infrastructure being used in dynatrace ?


dave_mauney
Dynatrace Champion
Dynatrace Champion

The static threshold technique is evolving with the new AI 2.0:

https://www.dynatrace.com/news/blog/enhanced-ai-ro...


Hi Dave,

thanks for your information.

Exactly, 2nd generation AI makes better analysis!

Does this enable Dynatrace system to use automatically generated baseline for infrastructure monitoring?


dave_mauney
Dynatrace Champion
Dynatrace Champion

From the blog above, it appears more that it looks at deviation from the norm. Similar to a baseline, but I believe a bit more flexible. This quote is key: "The example below shows how the new AI detects root causes without triggering false-positive alerts. The root-cause analysis states that the unhealthy host shows a 75% CPU usage increase as well as an increased number of Tomcat busy threads."