When looking at the service data for external services ("Requests to public networks" etc.) and opaque services, it seems that all 4xx HTTP responses are automatically regarded as failures. And for services where deep monitoring is active, only 5xx errors raise the failure rate by default, and 4xx client-side errors do not.
Can anyone explain the logic behind that? For example, one opaque Python-based service of ours was reporting a high failure rate due to several requests ending up with HTTP 404. I had to edit the client side error detection to not regard 404 as a client side error. If deep monitoring had been enabled, these 404s would not have raised the failure rate to begin with, and no extra configuration would have been required.
I'm quite sure that something has changed recently since we are also facing the situation that suddenly we started to get problem notifications from our haproxy services and all of the errors are "404 not found" type of errors.
I'm also facing some challenges with the exclude rules also since I try to exclude certain situations from the Service1, but the rules does not work because based on answer I got from support "It's marked as failed requests as Service2 returned Server-side failure reason with HTTP response - 500 - Internal"Server Error. This server-side exception also propagates to the caller service.
My challenge is that the service2 is rest interface which utilizes the HTTP 500 answer as generic error response so this is causing quite a challenge currently. Not sure if the logic behind of the exclude rules has always been like this or if this has changed also in some point.
I don't have HAProxy running anywhere for reference, but is that perhaps seen as an opaque service? That would then explain why 4xx errors are affecting the failure rate. As for something changing recently regarding these - I have no idea 🙂