Summary
We’ve observed an segfault issue impacting customers using the following ingress controller image:
mcr.microsoft.com/oss/v2/ingress-nginx/controller:v1.13.9
Problem
- Intermittent request failures
- Frequent crashes of nginx ingress worker processes
- Kernel logs showing repeated segfaults:
kernel: nginx[4143235]: segfault at 7 ip 00007feb5b80845b sp 00007ffde8362200 error 4 in liboneagentnginx.so
Root cause (likely)
This image is part of AKS Application Routing add-on (managed NGINX ingress by Microsoft)
- It uses a patched/custom nginx binary. This image is part of AKS Application Routing add-on (managed NGINX ingress by Microsoft)
- Dynatrace NGINX code module (liboneagentnginx.so) relies on fixed assumptions about nginx internals.
- Due to this binary mismatch, instrumentation can lead to segfaults in nginx processes.
- Microsoft: https://learn.microsoft.com/en-us/azure/aks/app-routing
Important notes
- AKS App Routing is fully managed, so:
- Pod/deployment-level changes are restricted
- Forced runtime instrumentation is NOT possible
Resolution
Disable OneAgent injection for ingress pods/namespace
For cloudNativeFullStack / applicationMonitoring:
For classicFullStack
Disabling injection successfully stops crashes and restores stability.
What's next
- Link to Azure Github Issue - https://github.com/Azure/AKS/issues/5796
- This is currently the only supported workaround.
- In case of any other patched NGINX, please follow document
- Please open a support ticket incase of any issues/concerns along OneAgent Support archive.