Aug 29, 2009

Posted by in Tech Tips | 8 Comments

Server 2008 DNS Resolution Issue

Last week after making a change to our 2008 DNS infrastructure we started getting a re-occurring error where DNS requests for certain top level domain were not being resolved. After investigating the issue for a bit it turned out there was an error with the way server 2008 DNS handles TTL (Time To Live). The issue causes .co.uk domains to become unresolvable (as well as other domains that have circular dependencies).

For this post I have made a screencast that runs through the error and how to fix it but I will also post the details below. If you have a look at the screencast I would appreciate any feedback (via the comments) so I know whether i should look at recording more videos in future.

Symptoms

Clients are unable to resolve .co.uk domains and when using nslookup the following error is returned:

*** Unknown can’t find <name>.co.uk: Server failed

The error happens every two days when using Server 2008 DNS servers. If you are using forwarding then this error does not happen.

Cause

The .uk top level domain uses nameservers that are all something.uk (currently NSx.nic.uk). This means that the .uk domain is using circular dependencies and so any name resolution needs glue records supplying from the root-hint servers to allow it to process any .uk queries.

image Correct records

This error is caused by Server 2008 ignoring the TTL supplied by the root-hint servers and overriding it to be 1 day (discovered after looking through packet outputs in the DNS logs).

image Server ready to fail

This means that the glue records expire for the .co.uk namservers (something.nic.uk). After the first expiry the server copies the NS records from .co.uk into its cache for .nic.uk and everything carries on working.

image Server in a failed state

After the next 24 hours has gone past the glue records expire again but the NS records are left behind. Now if a client tries to resolve a domain the server is unable to contact the .uk namesrvers and so returns the failure message.

Solution (dirty hack!)

There are a few things that can be done to get around this issue, firstly you can clear the servers DNS cache and this resets everything. However this isn’t really a fix as the problem will reoccur in another two days time. Secondly you can configure you DNS server to use a forwarder so that all DNS requests are forwarded to another server, again this might not be appropriate depending on your requirements.

Lastly you can configure the max TTL value for a record to be 2days, this over rides the default behaviour of setting it to 1 day. I don't regard this as a proper fix to the problem because if the TTL for the uk namserver records is ever modified then the problem will just re-surface. A correct fix to this problem would be to make sure the 2008 DNS server obeyes the TTLs given to it by the root-hint servers.

To apply the change open up the registry and navigate to the following key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters

Create a new D-Word value called “MaxCacheTTL” and set its value (in decimal) to be 172800

Restart the DNS service and then everything should function correctly.

  1. Like the webcast, very informative. One slight error though is the demo at the very end at about 7:45 you say the record is valid for 1 hour 23 instead of 1 day 23 hours. Cool though! 🙂

  2. Very nice.
    I have not seen it myself, but a warned person counts for two (dutch saying).

  3. Excellent webcast, very useful.

  4. Outstanding, and very helpful. Thanks so much!

  5. Ben,
    Great write up and screen cast. We are getting affect by this a lot recently. But like you explained I don’t think Microsoft’s resolution of setting MaxCacheTTL to 2 days is the proper fix. That assumes all name servers use a TTL of 2 days for their NS records. Seems to me they should release a hotfix for this.

    Of course what is the bug? Is it that DNS Server is ignoring the TTL of the glue records or is it that DNS Server is not applying MaxCacheTTL to NS records? I guess this depends on what MaxCacheTTL is designed to do. Is it supposed to be a default TTL when the responding DNS server doesn’t specifiy a TTL or is it meant to say I don’t any records cached on my server for X number of seconds (default being 1 day). If the purpose is the latter I would suggest the bug is that DNS Server is not applying MaxCachTTL to NS records.

  6. Thanks for your podcast. Having a similar issue so will try what you have shown.
    Thanks

  7. Thanks for your guide.

    It is Really good, I think you should ad a Donate link as
    you should be rewarded for what you’re doing.

    One hand washes the other, and together they wash the face. = Cooperation leads to accomplishment.

    Thanks

  8. Nice article

    I will also share it with my friends
    Thanks for helping me out!

Leave a Reply

Your email address will not be published. Required fields are marked *