A customer of ours in NZ was looking to utilize the DNS decode, and had a few questions around what it can do. The customer (and I) looked through the APM Community for some insight into what the DNS decode actually does, but the most we could see on here is that it does exist, and that is about it. I did some digging, and thanks to colleagues around the world we have come up with a bit better explanation of what the DNS decode does. I figured it would be good to post our findings on the community for anyone else to reference.
If you have anything additional to add, or have any more questions around the DNS decode, please leave them in the comments below!
From the Customer:
As discussed we are looking to offer the DCRUM DNS monitoring as a solution to the DNS issues we have been having recently, I could not seem to find enough detailed / sales or technical type information about the capability in the community library – Probably user issues.
Essentially looking for information that covers the following items regarding DCRUM DNS integrity / availability monitoring:
Any advice on this would be much appreciated.
Response to Customer:
I have had some feedback from my colleagues around the DNS Decode. Unfortunately there is no one Compuware ‘whitepaper’ we can find that covers all of this, but I have included references to what you can see in a live DC-RUM environment. To answer your questions:
1) What DNS components does it monitor? – It monitors all ‘DNS’ traffic to and from the DNS servers. This allows us to see number of connected clients (‘users’), performance of DNS (Fast, Slow, Aborted, or Failed requests), where time was spent during a request (Client, Network, Server, or Redirect). You can also see a breakdown of the types of DNS errors experienced by a particular client or server.
2) What can it do for my DNS space in a proactive manner? – Monitor availability, performance (response time) and errors. As seen below, you can see the availability drop off sharply, and you can line this up with the two servers that are causing the bulk of the DNS errors.
3) What DNS anomalies can be retrieved with DCRUM? – Refer back to the below screenshot to see the types of DNS errors we pick up (DNS Refused, DNS not Implemented, DNS Name, DNS Server Failure, DNS Format, DNS Timeout, and DNS other errors).
4) What can we do at a reactive level? Figure out if DNS is down, performing slowly, or throwing errors – You can then use these metrics by server or by client to see if it’s just a single DNS server or all or a single user or all. You can also see the impact on the network, as seen below, compared to the baseline for that application.
5) What limitations if any does DCRUM have in monitoring the DNS space. As with all DC-RUM Monitoring, we do not see what is going on inside the application (DNS, in this case) on the application server. This would require an application monitoring tool, such as dynaTrace (which you already have).
Great work guys!
It would be good if we could also gather remedies for things such as DNS aborts and others. What did you do to isolate the offender and how did you (or someone else) solve the problem?
I've in particular seen huge volume of aborts on some installations but never had time to figur out what to do (not a DNS guru).
Well the screenshots in the post were from our demo site, so we didn't spend much time solving the problem, so much as showing what information DC-RUM provides. The customer was looking for reports along the lines of what is presented here:
Not being a DNS guru myself, I had to look up what most of it meant, but it looks like they were looking for things included in the DNS header, much of which we do not appear to grab with the DNS decode. Would be interesting to hear from development how much work it would take to get the rest of the information in the DNS header in to our DNS decode, and if it would be feasible to get this in a future release.
With the current DNS decode, we can see which clients or servers are experiencing the most failures, and what type of DNS failures they are experiencing (see screenshot 3), which may be helpful for someone experienced in the ways of DNS.
This is amazing! I'm trying to prove the value at my client as we speak and this has been very helpful. Some additional questions though.
The screenshot that showed the type of DNS error. Does that require an ADS? Also, I have a client that is seeing half of all the request to his server are showing as aborted. What does that mean in DNS land?