Solved: Workflow triggering too fast?

DavidGallay · ‎06 Jul 2023

We have built a workflow to detect app pool process availability problems and kick off a pipeline to query the problem details and then run an HTTP request to restart them. It works great, however we have found that in some instances when the workflow is triggered the problem details don't seem to be available in Grail yet so the query returns no records. If it's run a minute later, it works fine. Is there a way to put in a pause or a loop in the DQL query to keep running until it returns a record?

DavidGallay · ‎06 Jul 2023

Adding a query filter to the event trigger to only look for matchesValue(event.status_transition, "UPDATED") does seem to help. I assume when the problem is updated all of the tags and entity info is fully populated.

ChristopherHejl · ‎07 Jul 2023

Hi David!
Since Problems evolve over time and might or might not include all relevant data from the beginning as well as the storage sometimes may take a little longer to persist this info, I recommend to filter the event trigger as much as you can. If you rely on data on the DAVIS_PROBLEM event, you could match for that data specifically (like affected entity tags).

While filtering for the UPDATED status transition might reduce the likelihood of your issue, it's not guaranteed as there might be multiple updates to a Problem throughout its lifetime. This requires of course that this data is present on the DAVIS_PROBLEM event in the first place and we are working on adding more info.

As for looping / retrying in the workflow: You can define the retry behaviour of tasks in the task options, however a retry will only occur if the task fails. Since the Grail query does not fail when it doesn't find any data (only when there is a problem when running the query), you could use a Javascript Task instead to verify the data you are looking for is present and fail in case it is not. In combination with the task retry option (which also allows to define a wait period) this should hopefully serve as a robust solution to this challenge.

I've prepared a sample script to run a DQLquery, use event context as part of the query and fail in case there are no records returned.

import { queryExecutionClient } from '@dynatrace-sdk/client-query';
import { execution } from "@dynatrace-sdk/automation-utils";

export default async function ({ execution_id }) {

  const exe = await execution(execution_id);
  const eventContext = exe.event();

  // print workflow event context
  console.log(eventContext)
  
  var my_query = `fetch events
    | filter event.kind == "DAVIS_PROBLEM"
    | filter event.id == "${eventContext['event.id']}"`

  var query_config = {
    body: {
      query: my_query,
      fetchTimeoutSeconds: 30,
      requestTimeoutMilliseconds: 35000
    }
  }
  
  var query_result = await queryExecutionClient.queryExecute(query_config)

  // print query result
  console.log(query_result)

  if(query_result.result.records.length > 1)
    return query_result.result
  else throw new Error('No Problem data found in Grail.')
}

Besides this option, we are also planning to add a task option to wait before executing a task in the workflow in the coming months.

DavidGallay · ‎07 Jul 2023

Hey Christopher!

Thanks for that detailed answer, especially the example. I knew the javascript task could do queries but didn't quite understand how to link them up to the event trigger. Definitely going to try a new workflow with this option to see how it works. It will be great when the task wait option is delivered too, I can see using that a lot especially for async calls to other automations.