cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

host availability state not changing

Troy
Participant

Hello,

Im trying to create an alert based on a hosts availability.state not being "up".
the issue is when i turn off oneagent - it doesnt always set the state to "unmonitored_agent_stopped" it will just leave gaps as shown. so my alert does not always work.

troymaxwell_0-1753224340524.png

 

We want to be able to alert on agent being turned off / host being offline and customize the alert which is why we are not using the OOTB option. 

DQL I tried using David Anomaly Detection:

timeseries availability = sum(dt.host.availability),
    nonempty:true,
    filter:{availability.state != "up" },
    by:{dt.entity.host, availability.state}

 

7 REPLIES 7

p_devulapalli
Leader

@Troy If your intention is to trigger an alert whenever the availability state != up, you can use the timeseries command with the nonempty parameter to calculate host availability. This parameter ensures that you get a result even when no data match the filter, such as when no hosts are up

https://docs.dynatrace.com/docs/discover-dynatrace/references/dynatrace-query-language/commands/metr...

Here is an example you can refer 

https://docs.dynatrace.com/docs/shortlink/metrics-on-grail-examples#example-10-monitoring-host-avail...

 

timeseries Availability = sum(dt.host.availability, default:0),
    nonempty:true,
    by:{dt.entity.host},
    filter:{availability.state == "up"}
    //| filter dt.entity.host == "HOST-XXXX"

 

You need to use a threshold to alert if less than 1 and that should do the trick 

p_devulapalli_0-1753230522519.png

p_devulapalli_1-1753230666915.png

 

Phani Devulapalli

Ah! I was close - thank you this seems to be working.

@p_devulapalli So this works to a degree - im seeing the alert come in but it will close the problem after some time even though the value is still 0 / agent is still down. any idea how i can fix this? 

@Troy That should not be happening , the problem should not be closing unless there is a change in threshold . Do you still see this happening ?

Phani Devulapalli

Hello - Yes I see this happening, the problem gets created and stays open for a few minutes and closes on its own, the alerting is set to 3 ,5, 5 which is the default values. the problem closes even though the agent is still down and value is 0 for availability. 

@Troy That's bit odd , did you add any dimensions that change over time to the anomaly detector event name or description?

 

Phani Devulapalli

Nope - My detector is essentially the same as yours except it is filtering for on Os instead of a single machine. may open a case with Dynatrace if this behaviour is not expected. 

Featured Posts