Solved: Frequent issues

YuvalKonstanti · ‎18 May 2021

Hello,

We had a production issue with a host, but It was not alerted because these kind of problems are Frequent Issues by Dynatrace.

It says:

"Dynatrace reports recurring problem patterns as "frequent issues." Alerts are sent out only if severity increases."

Can you please advise if there is a way to disable this feature?

Thanks!

waikeat_chan · ‎18 May 2021

Hey, perhaps this is what you want to look at?

Just go settings-> anomaly detection - frequent issue detection.

Over there has three button for you to toggle/switch.

YuvalKonstanti · ‎19 May 2021

Thank you for your reply!

Has someone here tried it?

Basically what I want is to disable this "Frequent Issues" option, so when DT will detect an issue it will alert in normally.

I just want to make sure that if I disable those three option it will affect only this 🙂

Thanks!

Anonymous · ‎19 May 2021

It will affect the whole environment, not just that host.

Karen_Duxbury · ‎02 Jun 2021

Not sure if this helps, but there's also the option to reset the baseline within a service (vs applying to whole system).

Service -> Edit -> Anomaly Detection

YuvalKonstanti · ‎03 Jun 2021

Thanks! I didn't know it, good to know

AntonioSousa · ‎02 Jun 2021

Detection of frequent issues is quite an interesting topic, and has tricked me sometimes in the past. It is especially tricky when a problem appears in the "middle" of the problem. Seems confusing, and it sometimes is. I would recommend reading the following (several times):

https://www.dynatrace.com/support/help/how-to-use-dynatrace/problem-detection-and-analysis/problem-d...

In the end, it's an interesting solution to alert spam, but you have to understand it...

Antonio Sousa

YuvalKonstanti · ‎03 Jun 2021

Thanks for the reply! I will sure investigate it a bit more as it is a complicated issue indeed.

ct_27 · ‎31 Jan 2022

I found this thread in a search for "Frequent Issues" because we're having problem with DT altering us when an issue occurs in the middle of the night (but it's not critical so we sit on it till morning) but by morning (2 hours later) DT already auto closes the problem as Frequent Issue, then during business hours the issue goes from bad to worse but now we don't get alerts because in the middle of the night it flipped to a Frequent Issue. And other situations that have baffled us. I feel the logic of "Frequent Issues" is just not working in a useful way.

If we turn it off then alerts go off everywhere and it puts us back to having to get in and configure the environment down to very minute details but then you encounter issues where DT doesn't allow you the flexibility needed. If we turn it on then we miss getting alerts of critical production events. Again because of lack of detailed configuration but this time that Frequent Issue is on or off for everything with no option per entity to adjust.

I read the linked article above. Feels out dated because i see no mention of the 2 hour marker for which a problem flips to Frequent Issue. Either way, the current logic is causing issues.

HigherEd

YuvalKonstanti · ‎02 Feb 2022

Hey,

I agree with on you this, the logic for those Frequent Issues are a bit weird, it happened not once that we missed actual issues like this.

But if you raised this topic I actually have a question, what happened if I go to the Anomaly detection Settings and I turn the "Detect Frequent Issues" off?

Will it raise a problem? Will it not? Not actually sure how this settings works.

ct_27 · ‎02 Feb 2022

I'll provide updates once I get more details. The online documentation does a good job at explaining this very complicated equation but it doesn't go into enough detail to explain the behaviors we've experienced. Thus it's difficult to predict.

The situation where if a problem is opened for 2 hours it auto flips to Frequent Issue is one example i completely don't get. That's not what frequent means, something happening 1 time for a long period of time is not frequent. Sure, if you break that down into data points the argument might change, but that issue at hand isn't data points, it's that a single problem (1 occurrence) became marked as frequent.

A few months back we encountered similar issues so we turned off the Frequent Issues and within minutes our environment blew up with problems cards. We had a rough night and by morning we decided to just turn it back on, but it took 20% (of 7 days) to actually work again.

The reason though it blew up was because we didn't configure the safety nets in advance (configurations), so this time around we're starting to configure the environment in preparation, then turn it off.

HigherEd

YuvalKonstanti · ‎02 Feb 2022

Interesting to know!

In our situation, some problems that were marked as Frequent Issues just escalated and caused an impact. When investigation why there was no alert, we saw that is was due to this Frequent Issue feature that masked the problem.

Waiting to hear if you have any news about that.

Thanks!

blackeagle · ‎08 Feb 2024

Request: Add an option to exclude frequent issues events from multiple (infrastructure/service/application) problems

Related pain Journey: When we have a multiple problem that has several hundred events, 1 event can cause the problem to suppress alerts because of a frequent issue, and thus the multiple problem notification is delayed for a complete hour. This is problematic for major incidents that need a response within 15 minutes. Disabling frequent issue globally at the environment is not a good solution because we need them to suppress false positives in general. So, a feature allowing frequent issues from being excluded from multiple problems would be helpful.
Benefit: Reduce false notification delays in multiple problems.

Solution Journey: More a workaround but if we're lucky and have unattached problems (first symptoms) before the multiple, we can get our notifications in time.

We have two closed tickets documenting 2 of these situations.

Thank you

dan_m_smith · ‎06 May 2024

It would be nice if you could disable Frequent Issues detection at the Service or Host level, instead of globally. Has that been considered as an RFE?

Is there an API call that can be used to Reset the Reference Period as mentioned. Or do I need to go there once per week and reset it manually.