cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Problem not raised for a service for 100 % failure

Vikas_g199
Visitor

Hi Team,

We have observed a 100% failure with http 500 internal server error in a specific service but no problem was raised during the timeframe the baseline is also set at auto-adaptive and during the 7-day timeframe no anomaly was observed. can anyone please guide me on how to approach this issue and help me to resolve it. PFB Snapshots.

 

Vikas_g199_0-1709475601022.pngVikas_g199_1-1709475606875.pngVikas_g199_2-1709475614118.pngVikas_g199_3-1709475619579.png

 

5 REPLIES 5

AntonioSousa
DynaMight Guru
DynaMight Guru

@Vikas_g199,

It seems this is due to not having at least 10 requests per minute. You are looking at 72 hours, so it is aggregated, but I would say it doesn't pass the requirement needed for Davis to raise the problem automatically.

Given it's at the service level, you can adjust sensitivity of anomaly detection by following the configuration hints suggested in:
https://docs.dynatrace.com/docs/platform/davis-ai/anomaly-detection/adjust-sensitivity-anomaly-detec...

Antonio Sousa

@Vikas_g199,

Looking closely at the graph, you can see that a problem was generated yesterday by 3 PM, in a period where the 10 requests/min was probably surpassed.

Antonio Sousa

rgarzon1
Pro

 Can you please share your anomaly detection settings for services?, because this happen to us and after TOO many TOO many configurations, we came to the conclusion that (Avoid over-alerting) was making this with the problems and they werent raise as we like.

  • for make a "reset" we set the  min requests to 0 for the first week and after that 0.1 was something that worked for us. 

pd: did you used monaco in that enviroment for migrate or do some settings ?

hope it helps

 

fuelled by coffee and curiosity.

PacoPorro
Dynatrace Leader
Dynatrace Leader

DenisL
Dynatrace Participant
Dynatrace Participant

I would agree with Antonio here, the fact that I see a problem was in fact raised when it appears the raise count also went up; Perhaps the 'Avoid over-Alerting' feature is affecting here and needs to be evaluated - check that you at least have 10 requests per minute and see if that applies.
Secondly, Paco also suggested a very important thing to check which is the 'Frequent issue'; -  Open the Service and expand the timeframe (7 or even 30 days) and see if you noticed any raised 'Frequent issue: Failure rate increase'
I believe one of those or both is what's affecting the raised of new problems.

Let us know! 🙂 

Featured Posts