Solved: Re: Problem Not raised for a service for 100 % failure .

Vikas_g199 · ‎03 Mar 2024

Hi Team,

We have observed a 100% failure with http 500 internal server error in a specific service but no problem was raised during the timeframe the baseline is also set at auto-adaptive and during the 7-day timeframe no anomaly was observed. can anyone please guide me on how to approach this issue and help me to resolve it. PFB Snapshots.

AntonioSousa · ‎03 Mar 2024

@Vikas_g199,

It seems this is due to not having at least 10 requests per minute. You are looking at 72 hours, so it is aggregated, but I would say it doesn't pass the requirement needed for Davis to raise the problem automatically.

Given it's at the service level, you can adjust sensitivity of anomaly detection by following the configuration hints suggested in:
https://docs.dynatrace.com/docs/platform/davis-ai/anomaly-detection/adjust-sensitivity-anomaly-detec...

Antonio Sousa

AntonioSousa · ‎03 Mar 2024

@Vikas_g199,

Looking closely at the graph, you can see that a problem was generated yesterday by 3 PM, in a period where the 10 requests/min was probably surpassed.

Antonio Sousa

rgarzon1 · ‎03 Mar 2024

Can you please share your anomaly detection settings for services?, because this happen to us and after TOO many TOO many configurations, we came to the conclusion that (Avoid over-alerting) was making this with the problems and they werent raise as we like.

for make a "reset" we set the min requests to 0 for the first week and after that 0.1 was something that worked for us.

pd: did you used monaco in that enviroment for migrate or do some settings ?

hope it helps

fuelled by coffee and curiosity. ☕ searching for a job,

PacoPorro · ‎04 Mar 2024

Could it be this is detected as a "frequent issue"?
https://docs.dynatrace.com/docs/platform/davis-ai/anomaly-detection/detection-of-frequent-issues

DenisL · ‎04 Mar 2024

I would agree with Antonio here, the fact that I see a problem was in fact raised when it appears the raise count also went up; Perhaps the 'Avoid over-Alerting' feature is affecting here and needs to be evaluated - check that you at least have 10 requests per minute and see if that applies.
Secondly, Paco also suggested a very important thing to check which is the 'Frequent issue'; - Open the Service and expand the timeframe (7 or even 30 days) and see if you noticed any raised 'Frequent issue: Failure rate increase'
I believe one of those or both is what's affecting the raised of new problems.

Let us know! 🙂

Problem not raised for a service for 100 % failure