cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Workflow DQL for problems opened a few minutes ago

AsafAx
Observer

Hello community!

My end goal is to trigger a Jenkins job to restart entities using the Dynatrace "root_cause_entity_name". I want to use a Workflow to send the root cause of problems that are:

  1. Currently open
  2. Have been open for more than 6 minutes
  3. There are 2 or more problems with the same "root_cause_entity_name"

Examples:

  • If a problem is opened and then closed within 6 minutes, do nothing.
  • If two or more problems are opened with the same "root_cause_entity_name", but are closed within 6 minutes of the first one opening, do nothing.
  • If two or more problems are opened with the same "root_cause_entity_name", and are NOT closed within 6 minutes of the first one opening, send a webhook the "root_cause_entity_name" of those problems.

 

As a start, I think I've been able to create a query in a Notebook to get toward the problems that I want, but there are some fixes I can't figure out. Here is the current "Fetch events" I'm using in my Notebook. I'm not referencing the "counts" of "root_cause_entity_name"s and I'm also not looking at the events from -6m to now(). How can I get all the problem events that meet the criteria?

// Fetches all problems in timeframe. 
fetch events, from: -1h, to: -6m
| filter event.kind == "DAVIS_PROBLEM"

// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc

// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id

// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"

// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name

 

  1. Afterwards, how can I transfer this query to the "trigger node" of the Workflow? Or, is it "best practice" to trigger the Workflow on every "ACTIVE" problem, followed by an "Execute DQL Query" node afterward?
  2. After triggering the Workflow and running the query, how can I extract the list of remaining "root_cause_entity_name" values to send?

 

Thank you all for your help 🙂

Asaf

Asaf Axelrod
4 REPLIES 4

PacoPorro
Dynatrace Leader
Dynatrace Leader

Can this filter works as starting point for your workflow?

PacoPorro_0-1711538454883.png

event.kind == "DAVIS_PROBLEM" AND event.status == "ACTIVE" AND toDuration(event.end - event.start) > duration(6, "m")

 

 

Hi Paco, thanks for the quick reply!
I didn't think about using the toDuration() command, that's very useful. The only problem is that if the "event.end" has a number, it means that the problem isn't "ACTIVE" and therefore the query always shows empty results. I've tested around a bit and updated my original query to the toDuration() command and changed the "event.end" that you wrote to "now()", and added a count() by root cause, filter by count > 2, and filtered out "null" root causes...

 

// Fetches all the problems that have been open longer than 6 minutes
fetch events
| filter event.kind == "DAVIS_PROBLEM" AND (toDuration(now() - event.start) > duration(6, "m"))

// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc

// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id

// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"


// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name

| summarize count(), by: {root_cause_entity_name}
| filter `count()` > 2
| filterOut isNull(root_cause_entity_name)


This seems to properly bring me a list that meets all the requirements! 😄

 

How can I now implement this list of root causes to the node that sends it to the webhook?

 

Thanks again for your help!

Asaf Axelrod

Great, I think I'll be able to figure it out based off of what was discussed there. 👍

 

Thanks again Paco! 😀

Asaf Axelrod

Featured Posts