Solved: Detect failure rate increase for specific requests

yair_mashmor · ‎10 Oct 2021

Hello all,

We are getting many 'Problems' on services with lots of traffic but on specific requests that have very low traffic (below the baseline in the anomaly detection rules).

I'm trying to figure out how the failure rate anomaly detection work for Requests within a service.

Assuming the service is not a low load service, but that many of its requests are, how does the anomaly detection being calculated?

Does it check if the request is a low load request? or is it enough that the service as a whole is not?

Is there a way to control this behavior (without marking it as a key requests since there are many)?

Thank you,

Yair

ChadTurner · ‎27 Oct 2021

AI detection is baselined, so if the failure rate falls outside the baseline then an alert will be sent. If you have a set of failures every day at midnight then that gets added into the baseline over a week, and it will then start to say ahh these midnight failures of 50% are normal. But if the failure rate increase to 60, 70, or even 80%, then that will trigger the alerting.

-Chad