cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PRO TIP - Creating Global alerts with the ability to remove entities for use cases.

ChadTurner
DynaMight Legend
DynaMight Legend

As companies move more and more to cloud based solutions, adopt OTel standards etc, there is a bigger focus on creating Custom Metric Events for alerting. This guide will showcase a unique solution where Custom Events are built from a global scope, that then targets each monitored entity with the ability to remove any entity from the Global Alert. 

Step 1 - Collect the Metrics you want to alert on that are being ingested/streamed into Dynatrace. 

I highly recommend waking some type of Wiki that outlines the metrics in which you will be creating global alerts on. This will allow staff to visit your Wiki to understand what metric is or isnt included in your metric alert rules. For this use case, we will target AWS API Gateway. Here are a list of metrics that we collect by way of the cloud extension: 

ChadTurner_0-1739646771023.png

You'll notice we are calling out the metric name, along with the Metric ID, the unit of measurement along with the "GBL (Global) Negate".  We worked with the cloud team to define what they want to be woken up in the middle of the night for these metrics. Of the segments that they need to be alerted on, it included the Count Sum. We will use this as the basis to build the Global Alerting Design. 

Step 2 - Build the Metric Event. Now you can do this one of 2 ways, and it depends on your skill set with Dynatrace, DQL and advanced properties. I'll break the methods down here. 

Using V2 Metrics to create a Custom Event for alerting. (If you are good at Davis Anomaly Detectors, skip this step and go to step 3)

Before we go into the creation, I recommend opening two Tabs. One will be Metric Explorer and the other will be Settings>Anomaly Detection>Metric Events. We will start with the Metric Explorer tab.

As mentioned the Count SUM for API gateway is what we are looking to create an alert on. So I'm going to grab the metric ID or the metric name as listed in the above image, and paste it into the data explorer: 

ChadTurner_1-1739647295077.png

As you can see, im using the metric name but confirming the ID is the correct ID for the metric I've listed in the Wiki. Then I'm adding in a split by the custom device. This split allows me to look at all the entities that are API gateways and end up using a custom baseline for each one. But you can design this however you want. Much like a process group, you can look at the group overall, or split out the instances so all of them have their own line. Once that is set, flip it over to Advanced mode :). If you are fluent with the advanced mode, you can built the above out directly in advanced mode.  Click Run Query to verify you see the data the way you envision it. For me, I can see all the individual gateways as their own line item. So I just drop off the limit from the string: 

ChadTurner_2-1739647604648.png

Copy that Advanced String (Metric Key) and lets shift over to that other tab - the creation of a metric event. 

The next steps you will want to adapt to your organizations standards when it comes to naming these events. Here is a mock up for this Pro Tip: 

ChadTurner_3-1739647871178.png

Just make sure you paste in that Advanced Query as the Metric Key on your Metric Event you are creating. Name the Metric Event, add in you alert criteria as desired, adjust the alert title and payload as needed, while adding in all the properties you see fit. Then just save the metric 🙂 

Step 3 - Posting your Metric Event into a Davis Anomaly Detector (D.A.D). 

Navigate to the latest UI of Dynatrace and search for Davis Anomaly Detector, then select "+ Anomaly Detector" and Select "Improve Metrics Events with DQL"

ChadTurner_4-1739648201199.png

This will allow you to convert the metric event we just made, over to a D.A.D. Just search for the title of the metric event: Select it, or all of the ones you want to convert, and click "Transform":

ChadTurner_5-1739648314727.png

This will now make your Metric Event into a D.A.D and allow you to alert off all monitored entities that have the metric you defined. In this case API Gateway Count SUM's. But I talked about having a negation segment which will allow you to strip out Monitored entities from this global rule. This is where Tags come into play. 

Step - 4 Building your Negate Tag structure - Optional. So I've called my tags "AWS Global Alert Exclusion- <Metric Name>" so this use case its called: "AWS Global Alert Exclusion-API Gateway Count Sum"

ChadTurner_6-1739648545267.png

You'll notice I have two entries of which it leverages an audit segment by outlining the Ptask from our Service now construct where it was requested to remove the given custom device (API Gateway X) from the global alert. The second line entry is a place holder for team members to leverage if another use case comes in to exclude a segment from this global alert. 

Once you build this negate segment, and if you target an entity - go check it and ensure the tag is applied: 

ChadTurner_7-1739648780311.png

So now you have built the optional negate segment. To remove an entity from the global alert. Now if you need to make a specific threshold for this, you can do that by repeating the steps above. Other wise we will continue to the next step. 

Step 5 - Adding a Tag Negate to your overall D.A.D rule for the Target metric. 

Going back to your D.A.D rule, ill then add in a filter to NOT include entities that have the tag "AWS Global Alert Exclusion-API Gateway Count Sum:*" which allows me to exclude all entities regardless to what the ServiceNow task ID is. 

ChadTurner_8-1739649041237.png

If you are having trouble with DQL, dont forget you have Copilot also, that can always help you add in the not filter. Once completed, toss the query into a notebook and verify that the entity you want to exclude is indeed excluded. Save the D.A.D and you are good to go. You have have a single rule that will be alerting on everything in Prod that has the given metric, and you have the ability to exclude entities, while documenting the official request on the monitored entity, without impacting any other metrics. If you were to leverage a Maintenance Window, it would apply to all the metrics on the given entity. This gives you the most flexibility. 

If you do proper planning on step 1 and identity all the metrics you can do a bulk API posting of tags, metric events etc, then convert them all over, It drastically reduces the time to complete all of this. 

 

I hope this helps everyone with the building of global alerts for ingested metrics etc. 

 

-Chad
0 REPLIES 0

Featured Posts