We recently ran into a situation here the requests to unmonitored hosts jumped and has stayed there. This was noticed by some folks in AppDev when they were chasing some purepaths and found many broken chains. Some research indicated that "something" happened on 12/10/20 around 4:00AM. See chart below. Has anyone run into this type of situation? Of course "nothing" has changed. 🙂
Was any host monitoring turned off from the Dynatrace perspective? What about the level of monitoring? Was the oneagent reduced to infrastructure only monitoring on some hosts?
Good question. We have reviewed both situations and this has not been identified as the issue. We look good from a monitoring perspective as well as what we have in Full Stack vs. Infra Mode.
Hey Chad. Thanks for the response. Full stack monitoring is enabled on all servers and in fact we did patching this past weekend and full reboots were done.
Could it be that there's a component (LB/proxy/integration service/etc.) that's stripping the x-dynatrace headers, in effect terminating the PurePaths and making those requests look like they're towards unmonitored hosts? So for example connections towards domain names which resolve to load balancer VIPs and as such do not correspond to any specific server with OneAgent running..?
Another solid question and that is where we have been concentrating. The issue seems to be surrounding calls in/out of our Web Service Managed infrastructure that is front-ended by a LB. We are full y running in AWS on a mix of EC2 and Native Svcs and the ELB in this case is a CLB. We have reviewed changes at the time of the spike (approx. 12/10 at 4:00AM ) and there were no scheduled changes by our organization at that time.
Hi Dante. This is an interesting thought. I 'll work with the AppDev team on this. We only keep 10 days of traces in Prod so going back further for comparison sake is problematic.
Hello John K.
Can you check the backtrace for this traffic to know who made these calls?
Regards,
Babar
Hi Babar. Good thought The usual and expected suspects are the callers in all of our situations. We don't see any outliers.
HI Sanders. Thanks for the response. Full stack monitoring is enabled on all servers and in fact we did patching this past weekend and full reboots were done.
I would recommend opening a support ticket for this. That way support can dig deeper into the issue and hopefully find a resolution or explanation for this.
This has been done.