Solved: Re: RabbitMQ Monitoring Flatlines

idudneymitchell · ‎24 Apr 2026

I'm trying to alert on a common problem with our RabbitMQ queues. In our environment, we need to alert off of BOTH of these conditions being true:
1. Count of messages > 0

2. Slope of message count == 0

That is, the message count must not be 0, and none coming in or going out, then we need an alert. Acknowledge/confirm rate don't really cover this, in previous cases where we should be alerting, those rates were still fluctuating, and they're 0 when the message count is 0 anyway.

So, we need a way to alert on these combined cases. Right now, the best solution we have is DQL that looks something like this:
```
timeseries
ready = max(cloud.aws.amazonmq.message_ready_count_average),
by:{`dt.entity.custom_device`},
filter:`dt.entity.custom_device` == "CUSTOM_DEVICE-XYZ",
interval:10m
| fieldsAdd
band_min = arrayMovingMin(ready, 10),
band_max = arrayMovingMax(ready, 10)
| fieldsAdd
stuck_signal = iCollectArray(
if(
ready[] > 0 and (band_max[] - band_min[]) <= 1,
1,
else: 0
)
)
| fieldsKeep
timeframe,
interval,
`dt.entity.custom_device`,
stuck_signal
```

(this is for a all-queue max, but this would be specifized to individual queues once we have that set up)
This has several issues, like it will always be set to 1 for the beginning of whatever time window it's set to track over. Additionally, it's fiddly and very manual--I have to tweak a lot of the parameters here to get something that *mostly* works to alert us when we need it to. And, it's not very idiomatic.

Is there a better way for Dynatrace to alert us on these conditions?

MaximilianoML · ‎25 Apr 2026

Hello @idudneymitchell,

Your approach is valid, but I would not try to alert on “slope == 0” directly. I would convert the problem into a binary signal (1) when the queue depth is greater than zero and the min/max value over the last N samples has not changed beyond a small tolerance.

The key improvement is to ignore the beginning of the timeframe, because the moving min/max window is not fully populated yet. For example, if you use a 10-sample window, only evaluate from index 9 onward:

timeseries
  ready = max(cloud.aws.amazonmq.message_ready_count_average, default: 0),
  by: { `dt.entity.custom_device` },
  filter: `dt.entity.custom_device` == "CUSTOM_DEVICE-XYZ",
  interval: 10m
| fieldsAdd
    window_min = arrayMovingMin(ready, 10),
    window_max = arrayMovingMax(ready, 10)
| fieldsAdd
    stuck_signal = iCollectArray(
      if(
        iIndex() >= 9
        and ready[] > 0
        and window_max[] - window_min[] <= 1,
        1,
        else: 0
      )
    )
| fieldsKeep timeframe, interval, `dt.entity.custom_device`, stuck_signal

Then create an advanced custom alert on stuck_signal > 0

Once you have queue-level dimensions available, split by queue instead of only by the custom device. Also, I’d recommend treating the flatness threshold as a tolerance rather than exact zero, because exact slope checks tend to be fragile in real queue metrics.

I hope it helps you 😄

Max Lopes

idudneymitchell · ‎27 Apr 2026

This worked perfectly! Thank you, the most difficult thing was dropping the beginning of the timeframe, that fix was exactly what we needed.

MaximilianoML · ‎27 Apr 2026

Hi, @idudneymitchell

I'm happy to see that helped!

Could you gently give a Kudo for the answer? I'm aiming to be, someday, member of the month 😁

Max Lopes