How to act an failure rate too high alerts


Hi All,

Failure rate too high alert is coming very frequently .

We started ignoring as we could not act on it as it is too many. Anyone experienced the same scenario in case of failure rate too high and average response time degraded alerts.

Anyone travelling the same boat.




Hello Karthikayini,

In my opinion there should be a serious issue with the application/business transaction(s) that why you are receiving these alerts frequently so instead of ignoring you should understand the reason and address them accordingly.

The failure rate is the average percentage of failed requests. A violation is indicated if at least one Business Transaction currently produces more failures than expected, based on the data for the last week.

This is the predefined incident with the evaluation timeframe of 10 seconds which is active by default so you can deactivate from the System Profile > Incidents.

You can also configure both alerts directly from the System Profile > Error Detection
tab. These are alerts based on the rate of failed transactions or user
within a certain time frame. Use incidents based on failure
measures for more advanced alerts.

Both alerts (High Overall Failed Transaction Rate and High Overall
Failed User Action Rate
) are enabled by default with a time range of
five minutes and a threshold of 3%, that trigger an e-mail to the
Incident Email Group.



Hi @Babar Q.

Thank you for the reply.

I have created error detection alert directly from system profile tab.

I am doing monitoring and I have no functional knowledge on the applications which dynatrace monitors. So there is a gap in giving exact useful information from dynatrace to application team.

I have watched troubleshooting video as as well.

I am doing drill down accordingly.

Looks like application team not ready to act on the output dynatrace giving .

Can you please help me in going in the right path in troubleshooting.

High Overall Failed Page Action Rate incident->

drill down->user action pure path -> pure path,contributor,error --->

the error in the error tab are more in number..when I click on single error it gives yellow arrow highlighted node.

--> please confirm whether my understanding is correct.

How to act on it generally as more no of red marks.

Thank you so much in advance.

Hello Karthikayini,

Your are doing the right steps. The fact is that you will have to work hard to convince the development team to cooperate with you because this is common everywhere.

Instead of looking on the client side issues. I would recommend you to launch the 'Exceptions' and 'Errors' dashlet then start drilldown to the PurePaths/Response Time Hotspots etc.. to grab something valuable which other should accept/respect and take the actions.



Hi @Babar Q.

I tried hard to provide exact parameter where the application team can work on.

Still it is bit tedious for me in giving exact point as the troubleshooting gives me wide output.

The error count is more than 300 in last 30 mins.

When I analyse taking only 404 errors,the yellow highlighted nodes are high in numbers.

So can you please help me by providing some of the examples (the output given to application team as a root cause of the issue) you have crossed and fixed. I am trying to understand which point we can highlight like methods, or entry points or node or Web Requests Details .

Application team also not sure about the details they are looking for. SO trying to close the gap in this.

Thank you so much in advance.


Hello Karthikayini,

404 errors are only the not found errors e.g. abc.gif etc...

Look for the 500 errors and drilldown from there to PurePaths and analyze the errors.

Also drilldown from PurePaths to response time hotspots to know the API and tier are responsible and vice versa.



Hi @Babar Q.

Thank you for the reply.I will work on it.

Thanks & regards

Karthikayini M