This is not a question, but a shout-out regarding the new DNS queries functionality:
It is incredibly interesting and I have already done some very interesting dashboards with it. It will probably raise more questions than answers, and even then we have to be careful, because of the nature of DNS. But if you haven't noticed it, you'll be probably interested in checking it up...
How is everyone finding this feature so far? I love the idea, but currently most of our customers' servers are reporting very high DNS error rates, on average around 50 % but often even between 67 % and 100 %. Thus I do not feel I can really rely on this data. Is there any way to drill down to the results, to see which nslookups are failing exactly? I couldn't find any documentation on this.
Could it be that OneAgent itself is doing nslookups for cluster nodes and ActiveGates, and that is raising the DNS error rate?
Usually that's a normal behavior of DNS - there are often lots of NXDomain rcodes being reported. Could you please check that?
Unfortunately currently there is no way to check which lookups fail exactly, but we have that on our roadmap.
OneAgent also sends its own requests, but it's hard to tell whether it takes a significant part of your results. At least without the nslookup domain.
Thanks for the response Dariusz! 🙂 I do agree that NXDomain errors are common and to be expected, but the problem is if we're seeing a constant, extremely high percentage of those, the value of the data starts to be minimal because we wouldn't be able to differentiate anymore what's a problem and what's normal. Good to hear that the possibility to see which lookups are failing is on the roadmap - I believe that's a critical feature for this type of monitoring.
Regarding the OneAgent part, did I understand your response correctly that they are indeed included in this? If I may suggest another improvement for the roadmap, please allow the option to exclude all OneAgent/Dynatrace related lookups. At least I find it just misleading, basically bundling together the data related to what we're monitoring (servers/apps/etc.) vs. the Dynatrace tooling itself.
I've already done some debugging on my end, and I would say that in the cases where it is 50% it might be related to queries being made for IPv6 addresses that are not being resolved by the nameservers...
I have one server with 1.2 million querys a day, so I had to investigate 😉
Half of them were IPv6, which isn't being used (it's an internal server/application).A quarter of them are having the domain name appended again, so DNS query resolutions are like host.domain.com.domain.com
We are still investigating this, but it is pretty clear that there is some issue with FQDNs.
Not sure what impact this is all having on performance, as nothing in the machine seems to say so. But effectively, it's an enormous ammount of useless DNS queries...
Regarding the IPv6, we just configured the network interface to only support IPv4. In this particular server, which is internal, we had no problem, as the network is only IPv4. Public servers are another different story.
The remaining were relative to a JDBC connection pool, where the final dot was missing. We put in the FQDN, and the remaining 1/4 queries disappeared. We than discovered that the remaining 1/4 were from the JDBC connection software...None of them had an impact on performance, at least at a measurable level, from the service level / application level. But we did confirm at the DNS server level that the reduction was very substantial, and it even led us to shutdown more IPv6 across the server-base, which led to even more load reduction in the DNS server. Nice sweep overall 😉