We have instrumented OpenShift cluster with Dynatrace OneAgent in our environment and one of the services running within the container had OOM error on Metaspace. As a result the process was terminated and restarted automatically.
We do see events related to the process restart but there is NO event related to the OOM error, but we could clearly see that in the application log file. It was really difficult for us to figure out why the process was restarted as Dynatrace was NOT providing any details.
We have added the following events to be captured in the Kubernetes settings in Dynatrace. Is there anything else that we should be adding to capture the OOM errors? Please advise.
sometimes this can be trial and error. Id make sure that within your Kubernetes settings, you have defined out the Events Field Selectors as this will guarantee you'll capture the data. Once the data is captures, you can always create a custom event for alerting as well.