Solved: Re: Workflow DQL for problems opened a few minutes ago

AsafAx · ‎27 Mar 2024

Hello community!

My end goal is to trigger a Jenkins job to restart entities using the Dynatrace "root_cause_entity_name". I want to use a Workflow to send the root cause of problems that are:

Currently open
Have been open for more than 6 minutes
There are 2 or more problems with the same "root_cause_entity_name"

Examples:

If a problem is opened and then closed within 6 minutes, do nothing.
If two or more problems are opened with the same "root_cause_entity_name", but are closed within 6 minutes of the first one opening, do nothing.
If two or more problems are opened with the same "root_cause_entity_name", and are NOT closed within 6 minutes of the first one opening, send a webhook the "root_cause_entity_name" of those problems.

As a start, I think I've been able to create a query in a Notebook to get toward the problems that I want, but there are some fixes I can't figure out. Here is the current "Fetch events" I'm using in my Notebook. I'm not referencing the "counts" of "root_cause_entity_name"s and I'm also not looking at the events from -6m to now(). How can I get all the problem events that meet the criteria?

// Fetches all problems in timeframe. 
fetch events, from: -1h, to: -6m
| filter event.kind == "DAVIS_PROBLEM"

// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc

// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id

// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"

// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name

Afterwards, how can I transfer this query to the "trigger node" of the Workflow? Or, is it "best practice" to trigger the Workflow on every "ACTIVE" problem, followed by an "Execute DQL Query" node afterward?
After triggering the Workflow and running the query, how can I extract the list of remaining "root_cause_entity_name" values to send?

Thank you all for your help 🙂

Asaf

Asaf Axelrod

PacoPorro · ‎27 Mar 2024

Can this filter works as starting point for your workflow?

event.kind == "DAVIS_PROBLEM" AND event.status == "ACTIVE" AND toDuration(event.end - event.start) > duration(6, "m")

AsafAx · ‎27 Mar 2024

Hi Paco, thanks for the quick reply!
I didn't think about using the toDuration() command, that's very useful. The only problem is that if the "event.end" has a number, it means that the problem isn't "ACTIVE" and therefore the query always shows empty results. I've tested around a bit and updated my original query to the toDuration() command and changed the "event.end" that you wrote to "now()", and added a count() by root cause, filter by count > 2, and filtered out "null" root causes...

// Fetches all the problems that have been open longer than 6 minutes
fetch events
| filter event.kind == "DAVIS_PROBLEM" AND (toDuration(now() - event.start) > duration(6, "m"))

// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc

// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id

// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"


// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name

| summarize count(), by: {root_cause_entity_name}
| filter `count()` > 2
| filterOut isNull(root_cause_entity_name)

This seems to properly bring me a list that meets all the requirements! 😄

How can I now implement this list of root causes to the node that sends it to the webhook?

Thanks again for your help!

Asaf Axelrod

PacoPorro · ‎27 Mar 2024

Check this out
https://community.dynatrace.com/t5/Developer-Q-A-Forum/How-can-I-access-the-quot-loop-quot-variable-...

AsafAx · ‎27 Mar 2024

Great, I think I'll be able to figure it out based off of what was discussed there. 👍

Thanks again Paco! 😀

Asaf Axelrod