Solved: maintenance windows by metrics

DavidMtz · ‎21 Oct 2024

Hi

I have a questión, is possible do maintenance windows by metrics? for example if i have one server and i only need apply the maintenance windows only for one process or one service of the server, or only CPU or Memory for the server, it is possible?

Thanks.

Best regard.

MostafaHussein · ‎21 Oct 2024

maintenance window scope is targeting entities like host, device, process, etc.. but what you're looking for is not entity it's a component within entity and this is not possible, check the following link for the configuration api of maintenance windows and look for `The Filter object` section it will be more clear for you what I meant.

https://docs.dynatrace.com/docs/shortlink/api-v2-settings-get-schemas-builtin-alerting-maintenance-w...

Certified Dynatrace Professional | Certified Dynatrace Services - Observability | Certified Dynatrace Services - App Developer
Dynatrace Partner yourcompass.ca

Peter_Youssef · ‎22 Oct 2024

Hi @DavidMtz

Simply based on the management zone configurations as whatever will be configured as part of the management zone will be correlated with the alerting profiles and the maintenance window as well.
Management zone is an integral part of the maintenance window configurations.

Maintenance window is filterable based on Tags, Management zones

Hoping it adds value.
KR,

Peter

MostafaHussein · ‎22 Oct 2024

Dear @DavidMtz as @Peter_Youssef mentioned this may be used as a workaround but actually the problems is related to entities directly and with metrics indirectly so you may still receiving problems related to this host/cpu when configuring this even the problem is for CPU utilization for example and will not affected by the maintenance window.

BR,
Mostafa Hussein.

Certified Dynatrace Professional | Certified Dynatrace Services - Observability | Certified Dynatrace Services - App Developer
Dynatrace Partner yourcompass.ca

MostafaHussein · ‎22 Oct 2024

After making test for memory saturation as example I've included `memory page fault` metrics within dedicated management zone then configured maintenance window with timeframe "today - tomorrow" then I've made load by proceeding some heavy tasks on MS Edge browser and got a problem raised and alerted while the maintenance window in progress and supposed to not report any problems related to these metrics as assumed.

** Management zone configuration

** Maintenance window configuration

** Raised Problem (within root cause you'll find the page faults metrics which supposed to be excluded from the maintenance window)

finally, the conclusion is that the suggested work around is not the proper way and mostly this is not feasible. but we're there in the community to brainstorm together and find the right solutions.

BR,

Mostafa Hussein.

Certified Dynatrace Professional | Certified Dynatrace Services - Observability | Certified Dynatrace Services - App Developer
Dynatrace Partner yourcompass.ca

MichalStefanik · ‎07 Nov 2024

It would be easier to start from a slightly different angle. Understand at what level the problem occurs.

If we are talking about CPU memory usage, it always occurs at the host level. Now, going further, you cannot mute the CPU usage alert for just one process; it makes no sense. Let's say we have an alert at the CPU level of 80% and somehow disable CPU alerts for this process. Other processes consume 79%, and the muted one 20%. Is that OK? How can we distinguish this from others, whether the alert should occur or not?

The only rational option is to enable MW for the entire host, or disable CPU monitoring at the host level via the API call/UI settings.

If the problem concerns PGI (process group instance), you can freely select the given PGI from the entity type and indicate it via tags. However, you need to remember whether the PG (process group) also has settings for alerting (availability monitoring) because then you need to mute the alert at the PG and PGI level temporarily.

The service can also be indicated directly in MW, then alerts will not be created. But if we only want to mute the error rate on the service, it is easier to turn off monitoring for this period if other data is to be monitored.
On the other hand, applying MW and leaving it has a potentially better effect on monitoring. Baselining then bypasses this period in the calculation of expected values.