I wrote earlier today a reply on this topic on the public forums, but now that I checked, the reply appears to have got lost, although I still entertain the hope it may only have got delayed and will appear there in 24 hours. Anyway, this is the reason why I prefer to write longer posts on this blog rather than on forums - they don't get lost as easily, either due to forum bugs or to threads going old and forgotten.

So, the problem I want to discuss is about what can be done if the SMK, for some reason or another, becomes inaccessible. I had previously touched on this in a previous article, but this time I want to go in more details.

First, let's look into why such a thing could happen. For the SMK to become inaccessible, something out of the ordinary must happen - either a disk corruption, which would most likely impact more data than just the SMK, or a change in the system configuration that would invalidate the DPAPI encryption of the SMK, or some other problem that I don't know yet about - maybe some obscure software error we have yet to see. In the years since SQL Server 2005 has shipped, I have seen only one case where the SMK became inaccessible and the reason for that was that the master database had been restored on a different system, such that the DPAPI encryptions of the SMK were invalid - even this issue had happened in an internal testing environment and not in a real production scenario. So, there is a way you can realistically end up with an inaccessible SMK, but it is not an ordinary scenario.

The following discussion assumes that something happened to the SMK and just to the SMK - I don't make any assumption on what happened, I just assume that this SMK problem is not compounded by the existence of other problems affecting other areas than the SMK. Also, before launching yourself into a recovery operation, make sure that you can at least recover from that - you should backup your database files such that you don't get in a worse situation than when you started.

Let's review the encryption hierarchy around the SMK, because this helps us understand what we need to fix. On one hand, the SMK has two DPAPI encryptions: using the service account and the machine account credentials, so for the SMK to become inaccessible, the system must be incapable of decrypting both these encryptions (also see this article). On the other hand, the SMK is used for encrypting three different classes of entities: credentials, linked server passwords, and database master keys (also see this article). This information allows us to determine what must have happened for the SMK to become inaccessible - both DPAPI encryptions must be undecryptable, and what is the effect of the SMK becoming inaccessible - credentials, linked server passwords, and database master keys become undecryptable by the system. (If you're reading this article while working with future versions of SQL Server, keep in mind that the list of entities encrypted by the SMK may change.) We can also use this information to verify that the SMK is indeed inaccessible; to do this, we could attempt to use the SMK to encrypt a new entity, such as a new database master key (in a database created just for the purpose of this test) or a credential secret - if such operations fail, a final test can be made by attempting to regenerate the SMK - if this fails as well, it becomes pretty clear that the problem we're dealing with is related to the SMK.

So, the best fix for the problem would be to fix the DPAPI encryptions of the SMK. We can do this in two ways:

(A) If we have a backup of the SMK (this is why such backups are recommended), then we can restore that backup. We will need to use the FORCE option because the current SMK cannot be decrypted. This should fix our SMK and we can check this using the tests I mentioned above.

(B) If we don't have a backup of the SMK and if the issue happened because we moved the databases from a machine to another and if that machine is still available, then all is not lost. We can either make a backup of the SMK on the original machine (if it still has the database system on it) and then apply it according to the (A) solution, or we can copy the database files back to the original machine, to make such a backup. Either way, the purpose would be that we want to make a backup of the SMK and then restore it on the machine that has the SMK problem.

If neither of these apply - we don't have a backup of the SMK and there is no original system where we can make one, then we're in some trouble. Not a lot, but we'll have some non-trivial cleanup to do. So let's assess the situation in this case: we don't have a SMK backup and we can't access the SMK, so let's face it: we've lost the SMK; this means we've lost the credentials and linked server passwords encrypted by the SMK - we'll need to regenerate all of these with help from whoever created them. The good part is that we didn't loose the database master keys - unless we managed to also forget the passwords protecting them - that would be quite unfortunate. Because DbMKs are always encrypted by a password, besides a default SMK encryption, the latter can always be fixed as long as we know the DbMK password. So, let's enumerate the steps we need to follow to cleanup the situation:

(a) We need to regenerate the SMK, because the current one is now unrecoverable. So we need to use REGENERATE and the FORCE option to create a new and valid SMK.

(b) For each DbMK that had a SMK encryption, we need to open the DbMK using its password encryption and then we need to re-encrypt the DbMK using the SMK. This is the same kind of step that we would need to do when moving a database from one system to another. Here is the TSQL for it:

OPEN MASTER KEY DECRYPTION BY PASSWORD = 'password'
ALTER MASTER KEY ADD ENCRYPTION BY SERVICE MASTER KEY
CLOSE MASTER KEY

(c) We need now to recreate all credentials and linked server passwords.

The steps above should bring our database system back to normal operational state as far as the SMK is concerned. It's going to be painful to do (c), but that's the price to pay because we didn't have a SMK backup. (b) and (c) need to be done after (a) because they need a valid SMK, but their order can be swapped, i.e. we can do (c) before (b).