Alerting
Questions about alerting and problem detection in Dynatrace.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

The future of Alerting profiles

henk_stobbe
DynaMight Leader
DynaMight Leader

Hello,

I see that the classic alerting profiles are missing features I would like to have, so looking at workflows this seems more flexible way to send  out alerts.

I am guessing Dynatrace is developing an new alerting app as wee speak?

KR Henk

16 REPLIES 16

henk_stobbe
DynaMight Leader
DynaMight Leader

Sorry,

I think I found the answer myself (-;

henk_stobbe_0-1718387911451.png

KR Henk

r_weber
DynaMight Champion
DynaMight Champion

Careful @henk_stobbe ! Workflows are quite expensive (too expensive to use them for alerting profile replacement IMO).
A single workflow costs per hour it sits around and then also for every execution!

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net

Julius_Loman
DynaMight Legend
DynaMight Legend

@henk_stobbe as far as I know another solution is in development. So alerting profiles as they are will be phased out.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

That is completely right. We will introduce a license included notification possibility to replace the existing notification channels and alerting profiles. Of course we will only sunset the existing ones once the new feature has fully landed.

The plan at the moment is as follows:

- In October, the first license included one click notify me will appear directly in the new problems app, which sends out emails automatically. See the screenshot below:

wolfgang_beer_0-1726046303775.png

- Next step then is to offer a general purpose 'Automate' action on top of this to allow you to call any REST endpoint or trigger any automation step. The plan is that this will also be license included as long as you stick to a 'simple' workflow with a single action. Once you start programming on top @r_weber it will become a workflow 😉

Best greetings,

Wolfgang 

 

Hi , got a query  - for some one who is having 100's of  microservices across 50+ of clusters and aiming for setting up alerts as monitoring as code. 

Whats the recommendation -  Alerting profile + workflow or Workflow only ? 

 

sonja
Dynatrace Champion
Dynatrace Champion

hi! workflow is the solution to set-up your notifications.
For one action notification (e.g. sending slack messages), you can use simple workflow - they don't cost any workflow hours.
For more complex notifications (e.g. gathering additional data and proceeding only under certain conditions), you can use standard workflows. 

Here is an example for simple workflow for sending slack messages:
https://docs.dynatrace.com/docs/analyze-explore-automate/workflows/use-cases/workflows-tutorial-prob...

 

Alerting profiles will be phased out at some point, so best not to use it to stay future proof. 

I hope this helps,
Sonja

@sonja  actually, there is one gap - delayed notifications are something that's not easy to replicate with just simple workflows. Both execution delay and additional checks if the problem is still active require a full workflow.

Are there any limits on the number of running workflows for an environment?

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

@Julius_Loman Wolfgang mentioned the way forward on these delays on the Problem app feedback thread. 

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net

Thanks @r_weber , I missed that post. Good that this is on the way and will work with simple workflows.

Dynatrace Ambassador | Alanata a.s., Slovakia, Dynatrace Master Partner

sonja
Dynatrace Champion
Dynatrace Champion

that's right, we know that's the most important feature that is still missing - you can watch this product idea. We will update it as soon as available (coming next year): https://community.dynatrace.com/t5/Product-ideas/Delay-on-the-Davis-Problem-trigger-in-workflow/idi-...

Hello Sonja,

I stumbled on this post by chance.
I really think you should make an announcment about this decision as this was not clear so far and many are still using Alerting Profile + Problem Notification

sonja
Dynatrace Champion
Dynatrace Champion

Hello @y_buccellato ! just to be extra clear: it's totally fine for customers to keep using problem notifications and alerting profiles for the moment.

But we recommend starting to adopt our new and improved offering to reduce future migration efforts (no deadline for the moment). 

r_weber
DynaMight Champion
DynaMight Champion

I'd go with workflows. I'm currently in the process of migrating off the previously set up Notification +Alerting Profiles to workflows and came up with the attached workflow (much more powerful!):

Background:
We've set up Teams and Ownerships via K8s labels, tags etc. so that every entity/service/host/pod/namespace/... has a ownership label (where possible).
For every team there is a MSTeams Connection configured - all via monaco.
The attached workflow (as example) uses that ownership information, determines the relevant MS teams connection and then sends out the problem notification to the team. It also adds a simple comment to a problem as confirmation (not visible in the new problem app)

Additionally, if no ownership can be determined it will ingest a business event which allows us to track problems that do not have owners. This in turn can be used with another workflow to take other actions (e.g. trigger a generic notification, or allows us to detect "orphaned" problems)

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net

mailvk23
Newcomer

Tx @sonja  @r_weber .  That helps . Much appreciated , 

i felt that , the 'workflow' is meant for custom automations or tasks and not a direct replacement of 'alerting notifications' .  any views to enlighten here ? 

for out of box alerts  , it seems the only choice is workflow , as ownership details need to mapped. 

when it comes to other custom metrics like OTEL sourced prometheus metrics etc , is the suggestion remains same.?

for non DT metrics / custom metrics like prometheus etc ,  are you suggesting the workflow only ?

r_weber
DynaMight Champion
DynaMight Champion

@mailvk23 the approach here is two-fold.
For custom metrics and similar you can decide on two different approaches: Davis Anomaly Detectors or evaluation via Site Reliability Guardians. For me I use this rule of thumb:
If the metric is a "fast" updated metric (e.g. a timeseries with frequent measurements - lets say a datapoint every minute), create a Davis Anomaly Detector. You can also define details of the created problem out of this anomlay detector, attach it to entities etc. Then use normal problem routing workflows.
For "slow" updated metrics (e.g. you get a datapoint only every hour or even less) or if you are more interested in report style (SLO reporting) use a Site Reliability Guardian to evaluate your metric/KPI. Then use a Workflow to trigger the Guardian and evaluate the result. This guardian can itself take an action to notify users, or it can in turn itself ingest a Problem-creating event...which in turn can be then processed by a generic problem notification workflow (I prefer the direct reporting in such SRG cases).

Certified Dynatrace Master, Dynatrace Partner - 360Performance.net

sonja
Dynatrace Champion
Dynatrace Champion

hi @mailvk23

the simple workflow (1 trigger with 1 action) was introduced as a replacement of problem notification + alerting profiles.

You can think about it like this:

- problem trigger => replacement for alerting profile. This is the place where you define which problems you are interested into.

- workflow action (send slack message, send e-mail, ...) => replacement for the problem notification. This is the place where you can define what needs to be happen (which message, where to send it, ...).

We are planning some improvements to make it easier to use:  

- providing workflow templates for slack notifications for problem, sending e-mail for problems, ... not having to start from scratch every time you create a workflow

- on the problem card, seeing which workflow were triggered for that specific problem

I hope this helps,
Sonja

 

Featured Posts