Windows Time Service on a domain (referred to as 'Domain Synchronization' or 'Domain Sync' for short) is a huge topic. I will do my best to cover all of its aspects in this article, but some concepts won't be covered until a later date, and others still relate directly to the original RFC for NTP.
Background
As I stated in my previous post, the original reasons for developing w32time stemmed from the requirements imposed by Kerberos. In order for Kerberos to function securely, the time difference between the participating machines needs to be less than five minutes. In time, other components have come to rely on w32time, including Active Directory Replication and Windows Update. In a Windows domain, w32time needs to keep machines synchronized, and it needs to do so in a quick, efficient, and quiet manner.
Beyond NTP
The NTP protocol described in the RFC goes a long way toward designing a robust time synchronization solution. But in the end, what we are really interested in is just that: the solution. Keeping time synchronized between two machines is possible, but the solution needs to be more robust to deal with computers belonging to a domain. In particular, w32time works to answer these questions (just to name a few):
These questions are important, specifically in the domain scenario (as opposed to the home user scenario), since the needs of the home user and the needs of the domain user are quite different.
Designing Inside the Box
Because many components within Windows depend on w32time to keep the clock synchronized, w32time cannot take (hardly) any dependencies itself. If w32time relied on component X to do something fancy, and component X relied on Kerberos, then we would have a problem, since Kerberos relies on w32time. This would create a circular dependency and, well, that's a bad thing.
For this reason, w32time has a simplified mechanism to authenticate time syncs. More information on the authentication mechanism will be covered in a future post.
Intelligent Design
The first issue to address is finding someone to synchronize with. Each machine needs to sync with another machine to get its time. To do this efficiently and automatically, w32time uses the domain hierarchy created with the domain itself. In the simplest frame of mind, a domain consists of the following distinct entities (aka computers):
The inner working of what a domain is and how it operates is beyond the scope of this post, but this should be enough to provide the groundwork for our discussion.
Time Source Selection
Each member of the domain follows a different set of requirements, based on its role. Lets take a look at those roles:
These are the default rules of where a machine can go looking for a time source. Keep in mind that there are corner cases where the rules can be bent a little. A few additional rules:
Also, you may have noticed that a PDC can only sync from a DC or PDC in the parent domain. Well, what if you are in the parent domain already? This is a special case, which is detailed below in the section "Special Case: The Root PDC".
The time source selection mechanism works great to enumerate the possible machines to sync from. The problem is that this usually leaves more than one machine as a possible partner. We need a way to pick the "best" one of the group, and that is what scoring does for us.
Score!
Each possible machine is given a score, based on certain criteria. Once all of the candidates have a score, w32time simply chooses the machine with the highest score. Here is what the scoring looks like:
So why are these points given? Let's look at the rules individually. Machines that are in the same site as the one in question have the best chance of providing us with good time.
From this, we can derive a score for each machine, and then choose the machine with the highest score.
Examples
When a machine boots up, it will go looking for a time source. Depending on its role, it will be required to choose from a subset of possible machines to sync with. But how do we prioritize between the available choices? Lets take a look at the following example:
Example 1:
This example utilized the graphic above. The domains will be referred to as the "Left Domain", the "Right Domain", and the "Parent Domain".
Computer foo has just been joined to the Left Domain as a regular client (not a DC), and it booting up for the first time on a domain. First, we need to enumerate which machines are possible as partners to sync with. We will look at each machine to see if it is a possible sync partner.
Which machines aren't valid? Let's take a look (and find out why)
Ok, so we have our possible choices, but now we need to prioritize them to pick the best one. To do this, we will utilize the scoring system. Assuming that our entire forest is in one site, and we don't have any machines configured as "reliable":
So there we have it. The PDC in the parent domain will be our time source. But what if the [Left Domain] was put into a separate site?
Example 2:
Assume the same scenario as the above example, except that [Left Domain] exists in a different site from the rest of the forest. We will use the same logic applied above to determine a time source.
So the [Left Domain] is in a different site. Since the first part of time source selection does not take site location into consideration, we will get the same possible machines to sync with. However, the scoring system will provide us with a different machine when all is said and done. Lets look at how the scoring would now occur:
Because the DC and PDC in the [Parent Domain] are in a different site, they don't get the +8 to their score. This leaves us with the PDC of the current domain, with a score of 9. But what about the PDC of the [Left Domain]?
Example 3:
Assume the same scenario as Example 2, Again, we will use the same logic applied above to determine a time source.
With the left domain in a different site from the rest of the forest, and with the PDC of the [Left Domain] being the authoritative time source for the [Left Domain], we will need to go out of site for a time source - we have no other choice. So we will look at the scores for the various eligible time sources:
We cannot sync with any time sources in our own domain, so we only have the time sources from the [Parent Domain]. The scoring will give us the PDC of the [Parent Domain].
Plan B: Fail over
So what happens when things don't go as planned? Windows Time Service has been built to handle fail over situations from the beginning. For a generic example, assume that a client is currently synchronizing with a time source. If the time source goes away for one reason or another, the client will need to go looking for another time source.
For this reason, we use the scoring system illustrated above. The client will reassess the available time sources, score each of them, and choose the best one. Since the previous time source (which was probably the best first choice) has gone away, w32time will pick the next highest scoring time source.
Special Case: The Root PDC
The PDC for the domain at the root of the forest (the root PDC) poses a problem. Since it has no time sources that are more authoritative than it, it cannot choose a time source automatically. Thus, the administrator will need to set one up manually, or the domain will operate in a "standalone" mode. In the case of a standalone domain, the root PDC will still be the authoritative time source, but its time will come from its own clock.
Wrap Up
We have taken a look at how w32time operates in a domain at a very high level. Future posts will dive deeper into specific areas of w32time, and this will provide a groundwork for those other articles. If you have specific thoughts or questions about this post, please feel free to leave a comment. For general questions about w32time, especially if you have problems with your w32time setup, I encourage you to ask them on Windows Vista Applications section of the Microsoft Technet forums. One way or another, questions posted there should make their way to my inbox, and I will do my very best to answer them.
References