27 Mar 2024 10:52 AM - last edited on 27 Mar 2024 11:48 AM by MaciejNeumann
Hello community!
My end goal is to trigger a Jenkins job to restart entities using the Dynatrace "root_cause_entity_name". I want to use a Workflow to send the root cause of problems that are:
Examples:
As a start, I think I've been able to create a query in a Notebook to get toward the problems that I want, but there are some fixes I can't figure out. Here is the current "Fetch events" I'm using in my Notebook. I'm not referencing the "counts" of "root_cause_entity_name"s and I'm also not looking at the events from -6m to now(). How can I get all the problem events that meet the criteria?
// Fetches all problems in timeframe.
fetch events, from: -1h, to: -6m
| filter event.kind == "DAVIS_PROBLEM"
// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc
// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id
// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"
// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name
Thank you all for your help 🙂
Asaf
Solved! Go to Solution.
27 Mar 2024 11:21 AM - edited 27 Mar 2024 11:22 AM
Can this filter works as starting point for your workflow?
event.kind == "DAVIS_PROBLEM" AND event.status == "ACTIVE" AND toDuration(event.end - event.start) > duration(6, "m")
27 Mar 2024 01:37 PM
Hi Paco, thanks for the quick reply!
I didn't think about using the toDuration() command, that's very useful. The only problem is that if the "event.end" has a number, it means that the problem isn't "ACTIVE" and therefore the query always shows empty results. I've tested around a bit and updated my original query to the toDuration() command and changed the "event.end" that you wrote to "now()", and added a count() by root cause, filter by count > 2, and filtered out "null" root causes...
// Fetches all the problems that have been open longer than 6 minutes
fetch events
| filter event.kind == "DAVIS_PROBLEM" AND (toDuration(now() - event.start) > duration(6, "m"))
// Sorts the list so "CLOSED" is prioritized (top of the list)
| sort event.status == "CLOSED" desc
// Reads through the list ("CLOSED" first), and removes duplicate "display_id" values later on (already closed problems).
// This leaves a clean list of "CLOSED" and "ACTIVE" problems. This only works since the "CLOSED" status is sorted at the top of the list.
| dedup display_id
// Removed the "CLOSED" problems from the list
| filter event.status != "CLOSED"
// Set the displayed columns to the relevant information needed.
| fields timestamp, display_id, event.category, event.start, event.end, event.name, event.status, event.status_transition, root_cause_entity_id, root_cause_entity_name
| summarize count(), by: {root_cause_entity_name}
| filter `count()` > 2
| filterOut isNull(root_cause_entity_name)
This seems to properly bring me a list that meets all the requirements! 😄
How can I now implement this list of root causes to the node that sends it to the webhook?
Thanks again for your help!
27 Mar 2024 01:58 PM
Great, I think I'll be able to figure it out based off of what was discussed there. 👍
Thanks again Paco! 😀