Solved: Capture OOM errors Kubernetes

GAnantula · ‎19 Nov 2020

We have instrumented OpenShift cluster with Dynatrace OneAgent in our environment and one of the services running within the container had OOM error on Metaspace. As a result the process was terminated and restarted automatically.

We do see events related to the process restart but there is NO event related to the OOM error, but we could clearly see that in the application log file. It was really difficult for us to figure out why the process was restarted as Dynatrace was NOT providing any details.

We have added the following events to be captured in the Kubernetes settings in Dynatrace. Is there anything else that we should be adding to capture the OOM errors? Please advise.

involvedObject.kind=Node

type=Warning

involvedObject.kind=Pod

reason=BackOff

Thanks,

Ganesh

ChadTurner · ‎01 Dec 2020

sometimes this can be trial and error. Id make sure that within your Kubernetes settings, you have defined out the Events Field Selectors as this will guarantee you'll capture the data. Once the data is captures, you can always create a custom event for alerting as well.

-Chad

GAnantula · ‎01 Dec 2020

okay. Thank you!

tibebe_m_digafe · ‎05 Jan 2022

Hi Chad,

Could you please elaborate as to what needs to be done/configured to create custom events from the captured kubernetes events?

Thanks. Tibebe

Julius_Loman · ‎05 Jan 2022

@tibebe_m_digafe you need to create Log Event to get alerted on a particular kubernetes event type.

Certified Dynatrace Master | Alanata a.s., Slovakia, Dynatrace Master Partner

tibebe_m_digafe · ‎06 Jan 2022

Thanks @Julius_Loman