We are working on the integration of Dynatrace with Service Now and it appears that Service Now uses different terminology when it comes to alerts, problems, events, and incidents when compared with Dynatrace. Do Dynatrace Event Management terms map with ITIL terminology?
I reviewed the glossary of the ITIL terminology
(ITIL Service Operation) A notification that a threshold has been reached, something has changed, or a failure has occurred. Alerts are often created and managed by system management tools and are managed by the event management process.
(ITIL Service Operation) A cause of one or more incidents. The cause is not usually known at the time a problem record is created, and the problem management process is responsible for further investigation.
Dynatrace AI gives you the root cause for Problem 🙂
(ITIL Service Operation) A change of state that has significance for the management of an IT service or other configuration item. The term is also used to mean an alert or notification created by any IT service, configuration item or monitoring tool. Events typically require IT operations personnel to take actions, and often lead to incidents being logged.
(ITIL Service Operation) An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item that has not yet affected service is also an incident – for example, failure of one disk from a mirror set.
I feel that Dynatrace Aligns with most of them, but deviates with events. Events are any changes that has been detected by the oneagent, like maybe outlook crashed - that would be listed as an event. As well as trigger an alert as needed.
I feel that incidents are lumped into problems. The AI Engine detects and alerts on problems and correlates these for a root cause.
I think it's depend about your ITIL Service Model implementation and sector
If you check problem (export via API) you can easy see that "Event" classification is used in a different way .. for Event classification (not incident clustering). So if you have a big set of team with different specialization is not easy to address escalation process inside organisation.
Disk space 95% (event) .. is the root cause (possible) for service slowdown. Dynatrace open a problem 1* time (problem after several time become recurrent so Problem) because different from historical baseline but system is not so unstable to create an Incident (customer do not do any more business .. only bad user experience due too performance slowdown)
BAse on Dynatrace way to work (Proactive mode) ..
you have to start as event (promote as incident) and increase incident level base on BIA (Business Impact Analysis) until become really a problem (Set of incident where work around was applied still waiting final resolution ... think about an bugs where you need to wait the patching from the application server vendor
Hoping can help you for design.