09 May 2024 09:37 PM - last edited on 10 May 2024 08:10 AM by MaciejNeumann
Hi all,
We had a business outage which was detected and correlated by dynatrace in a "Multiple service problems". The problem card correlated 7 different services. Unfortunately no problem notification was sent because one of the service was in maintenance. The other 6 services where not in maintenance and had alert profiles and problem notifications set for them but sadly the associated notification problems did not trigger.
So dynatraces logic is this: if only 1 service correlated by Davis into a "Multiple" problem card is in maintenance, no problem notification is triggered for all the other services. All the correlated services are considered to be in maintenance.
The more Davis correlates (we've had problem cards with 50+ services correlated in the past) the more there is a risk an alert will never be sent because of a single service being in maintenance at that time.
Is this a problem anyone else had to deal with ? Anything ideas on how to avoid this situation ?
18 Nov 2024 12:47 AM
That is really interesting and I have not seen it from all the organizations I have worked with. Have you had this issue occur since this posting?
27 Nov 2024 03:55 PM
Hi Chad,
We stopped using "Disable problem detection during maintenance" and use "Detect problems but don't alert" for maintenance mode. This prevents services from being aggregated in multiple problem cards until the end of maintenance.
This workaround has a downside as we have no visibility on the status of our services when doing deployments or changes.
What we would need is a maintenance mode which does not impact the "multiple" problem cards and allows us to keep visibility when doing our changes without having alerts for specific services that are under deployement.
In other words, no alerts for service in maintenance but problem detection active so we keep visibility. Alerts for "multiples" as other services not in maintenance could be impacted.
It seems there is no combination of maintenance mode in the product as of now which allows for our usecase.
18 Nov 2024 07:53 AM
Hi @AlanZ ,
This problem is because whatever kind of integration we use we do not have an option to use {root cause entity} as a placeholder. Also, even if we do, we don't get to send individual notifications for all the impacted services which is something that is kind of a drawback.
We had the same issue and right now we're handling it like we have a workflow that triggers when a problem with title "Multiple service problems" / "Mutiple application problems" / "Multiple infrastructure problems" occurs it will trigger the workflow and internally using js we are splitting the impacted entities to an array and using that array we are running a loop on the entities and sending notification related to each entity.
Hope this helps.
Regards,
@Maheedhar_T
27 Nov 2024 03:43 PM
Thanks Maheedhar,
Creating and developping a seperate workflow seems the way we need to go as well to we make sure those important "multiple" problem cards are handled properly.
We opened an RFE as well but it does not seem that the usecase reported will be considered:
27 Nov 2024 04:39 PM
Hello @AlanZ
I'd appreciate checking the below configurations are enabled or not to pinpoint to the main RC
Hoping it adds value.
KR,
Peter