Here is the meeting summary of the incident i am having:
Dynatrace RUM Issue Summary:
All requests to the backend come over the azure firewall and are routed to the application gateway (WAF in detection mode), and finally to the backend.
We tried to re-produce the error normally and also by bypassing the application gateway to see if the problem is with the infrastructure and web application firewall.
Here I will summarize our findings:
Testing Without Application Gateway
When the application is bypassed and the traffic is forwarded from the Azure firewall straight to the backend, it works fine and the error does not appear.
The response cookie “dtCookie” is returned.
This suggests that the issue may be with the application gateway or the application gateway WAF.
Testing With Application Gateway (with WAF activated)
We changed the routing back to normal (azure firewall > application gateway> backend)
The error was reproduced.
When cache is disabled, the error does not occur.
Dynatrace seems to add cookies as part of the monitoring. They may be causing issues with the WAF, might be detected as some sort of threat. Cookie size could be an issue – possible size limit on app gateway WAF.
Something is preventing the response cookie. The Dynatrace response cookie is not received - dtCookie
There are many cookies now, the request is getting quite big. It may be possible that the total size of request is bigger than the limit. Limit is 128KB.
WAF rule set could also be blocking due to the inject.
There should be a way to exclude the “dtCookie” from the evaluation with custom WAF rule for each application. (not possible as a global rule, needs a custom rule for each application)
Testing With Application Gateway (with WAF disabled)
After disabling WAF – error still appeared. This suggests that the WAF is not causing the problem.
Maybe other backend settings on application gateway are affecting this – cookie affinity..
There are no logs on the backend. Can see in traefik, request goes to user management backend, no error, but response is never received. Everything has a response of 200
Configuration endpoint being called in the request is managed by development team – will need their support.
Need more information from Dynatrace about how the injection works – next session
Additional info about browsers:
Works on firefox and brave and safari browsers with no error’s.
Doesn’t work on edge and chrome
Error is present when using edge and chrome, it is not present when using firefox, brave and safari
Error is present when application gateway is used
Error is present when WAF is enabled
Error is present when WAF is disabled
Error is not present when browser cache is disabled
Error is not present when application gateway is not used
We don’t understand how the injection works – needs explanation from Dynatrace
This is a weird issue. I have a guess what I would share with you.
It can happen that even if the WAF is disabled then it is still working as the Tier of the AppGW is still WAF_v2.
Honestly I don’t know but MS can make a statement here but it is a strong guess that even in this case the “mandatory rules” are still in charge.
I checked the WAF logs and I can see that the backend violates rule nr. 980130 and nr. 949110.
Should i create an WAF policy to exclude these rules?
This is the Log query what I used:
| where Category == "ApplicationGatewayFirewallLog"
| where action_s == "Detected"
| where hostname_s == "test.XXXXXXXXXXXXXXXX"
| project TimeGenerated, hostname_s, requestUri_s, Message, clientIp_s, ruleId_s, details_message_s, details_file_s, policyScope_s, policyScopeName_s
| sort by TimeGenerated
| limit 100
Could someone help me ?