Almost every virtualization platform like Hyper-V provides ways to capture the complete state of a VM as a snapshot. You can later restore back to this snapshot quickly to get back to what you are doing earlier. This provides a powerful technique with many uses: changing and testing applications, prepare demo setups, create quick backups, to name a few.
However, if the VM happens to be joined to an active directory domain, after restoring a snapshot, you occasionally find that all authentication involving the VM seem to fail. From this VM, you cannot access any network shares outside. From your other machines, you cannot access any files shared in this VM, etc. You cannot even login as yourself, a domain user, into the VM: you get an error like “Windows cannot connect to the domain, either because the domain controller is down or otherwise unavailable, or because your computer account was not found. Please try again later.”
What is happening here?
The answer is machine account password mismatch: the VM thinks its machine account password to be something, while the domain controller believes it to be something else. The VM cannot authenticate itself to the domain controller.
In this blog, I will describe some simple experiments to understand problem better and suggest few strategies for dealing with it.
Just like user account passwords, machine account password is a secret using which a Windows domain member authenticates itself to the domain controller and establishes a secure channel.
When the computer is started, a service called NetLogon uses the machine account password and tries to establish a secure session with the domain controller. Other services running on this computer with LocalSystem or NetworkService credentials require this authenticated secure channel to get access to domain resources. The usual CTRL+ALT+DEL Winlogon process also relies on this authenticated secure channel to send user credentials to the domain controller for verification and log them into the computer.
The password is first created when the computer is joined to a domain. It is shared by domain controller and the computer. After that, for security reasons, the computer at periodic intervals (usually 30 days) negotiates with the domain controller and changes its machine account password. After this change, both the domain controller and the computer use the new password for authentication.
While using snapshots, when the domain member is restored to an older snapshot, it loses track of any password change changes done later and tries to use an older password. Hence it fails to authenticate itself. Various things fail as a consequence.
In this blog, I will describe a simple experiment to force machine password problems and analyze it. For this purpose, we will need a Windows VM that is a member of some domain, and hosted in Hyper-V.
In order to study and manipulate machine passwords, we will use a tool called nltest.exe. This is available as part of the Windows Support Tools. This needs to be downloaded and installed. Nltest.exe uses Windows API like I_NetLogonControl2 underneath and interacts with Netlogon service.
We will login as administrator, launch a command prompt, and then run the following commands:
(1) Find out the domain the computer is part of.
>> nltest.exe /parentdomain
The command completed successfully
(2) Since we have already logged in, this computer would have authenticated with its domain controller and established a secure channel. Verify the health of this channel.
>> nltest.exe /sc_query:mydomain.corp.mycompany.com
Flags: 30 HAS_IP HAS_TIMESERV
Trusted DC Name \\HYD-FE-DC-02.mydomain.corp.mycompany.com
Trusted DC Connection Status Status = 0 0x0 NERR_Success
(3) At times, it may be possible that computer is using a cached session channel which may not be valid anymore. Just to be sure, force the computer to re-authenticate with its domain controller and re-establish a new channel.
>> nltest.exe /sc_reset:mydomain.corp.mycompany.com
Trusted DC Name \\hyd-fe-dc-01.mydomain.corp.mycompany.com
The new secure channel has gotten established correctly as well.
(4) Now look at the machine passwords the machine is actually using. For security reasons, the actual password is never shown. We can supply a message that will be hashed with the password and the digest will be displayed.
>> nltest.exe /cdigest:MESSAGETOHASH /domain:mydomain.corp.mycompany.com
Account RID: 0xce947
New digest: 50 40 f1 b1 9d 2d 0f 80 dc 46 e4 78 a7 ee 43 e9 P@±¦¥-.Ç_FSxºeCT
Old digest: 50 40 f1 b1 9d 2d 0f 80 dc 46 e4 78 a7 ee 43 e9 P@±¦¥-.Ç_FSxºeCT
There is one interesting thing to note here: There are two digests shown corresponding to two different passwords. For sake of reliability, the computer always remembers its last two machine passwords: the current one and the preceding one. This is necessary because a password change may take time to propagate to all domain controllers in forest. So it is necessary at times to use the old password as well.
(5) At this point, take a snapshot using Hyper-V. Let’s call it Time1. We will use this snapshot later to reproduce password mismatch problems.
(6) Now, for the sake of the experiment, force a machine password change. Machine password changes are always initiated by the domain member (and not by the domain controller). In general, machine password changes are done in once in 30 days. But we don’t want to wait that long here and so take matters into our hands.
>> nltest.exe /sc_change_pwd:mydomain.corp.mycompany.com
Connection Status = 0 0x0 NERR_Success
(7) Look at the new passwords used.
New digest: 27 ef e6 b9 f3 24 e9 8a 17 a0 c2 f3 12 28 5e ca 'nµ¦=$Tè.á-=.(^-
As you can see, the previous digest 50* has become the old one, and a new digest 27* is the current active password.
(8) Since last two passwords are tracked, to force a problem we will need to change the password once more.
(9) Look at the new passwords again
New digest: 13 b1 5f 2b ac b0 35 43 cc 00 81 d8 bf 68 5e 76 .¦_+¼¦5C¦.ü++h^v
Old digest: 27 ef e6 b9 f3 24 e9 8a 17 a0 c2 f3 12 28 5e ca 'nµ¦=$Tè.á-=.(^-
Now, the original digest 50* that was used in our snapshot Time1 is gone.
(10) Now go to Hyper-V and restore to snapshot Time1.
(11) Now, check out the machine account password used here.
Since Hyper-V has restored the snapshot, the computer believes its machine passwords to 50*. Obviously this does not match what the domain controller thinks, since the password was changed a little while back.
(12) Now get the computer to re-authenticate with the domain controller
I_NetLogonControl failed: Status = 5 0x5 ERROR_ACCESS_DENIEDS
As expected, Domain controller has refused to establish a secure channel. Try to access some network shares from here now. Or logoff and try to login again as a domain user. All fail. This VM is now busted. It cannot authenticate itself to the domain controller.
There are few broad strategies to deal with machine account password problems that I am aware of:
(1) Increase machine account password age or disable password changes altogether: Both these can reduce likelihood of the problem. These settings are available on the domain member (and not in the domain controller). As such, you can change them on your computer. However, some domain administrators frequently use machine account password age to run scavenging scripts on the domain controller. When this happens, your machine can get knocked off the domain and you are in trouble.
(2) When you restore a snapshot, detect if password has expired. If yes, then remove the machine from the domain and join it back. You will need a privileged domain account to do this. Then you can start using the machine.
Detect the problem
Correct it by removing and joining it back to the domain. You can use the netdom.exe tool from support tools, or you can do this from windows explorer.
>> "netdom.exe" remove frodo50 /Domain:mydomain /userd:mydomain\sudhakar /passwordd:***************
The command completed successfully.
>> "netdom.exe" join frodo50 /Domain:mydomain /userd:mydomain\sudhakar /passwordd:***************
Reboot the VM. Now it will work well.
(3) The final strategy is a bit sophisticated. Create your own domain controller VM and host it alongside the domain member VM you are using. Snapshot and restore both of them together so that there is never any mismatch. As a bonus, since you have your own domain controller, there are a lot of other powerful things you can do. The product I work on, Visual Studio Lab Management 2010, provides a feature called Network Isolation to make this process easier.
Views and opinions expressed in this blog are my own and do not reflect that of my employer.