Kerberos Error Configuring SQL Server Linked Server for Delegation

Today I want to post about my findings on a case involving a linked server between two SQL Server servers. Before anything else I want to give thanks to my colleague Ignacio Alonso Portillo, who was kindly enough to help me not only with this support case but with many others too.

My customer called Microsoft Support looking for help to configure a linked server using a three-tier client/server model, where a client connects to an intermediate server first and the client credentials are transferred from the intermediate server to the backend server. This is commonly known as “double-hop” scenario; the following diagram illustrates this situation:

MSDN_Image_Double_Hop - Copy

The process that takes care of forwarding the client credentials from the intermediate server (“SQL Server 1” in our example) to the back-end server (“SQL Server 2”) is called delegation.

In our scenario the client was trying to access the remote SQL Server (“SQL Server 2”) by the means of a Linked server connection on the intermediate (“SQL Sever 1”) box. Testing the Linked server was consistently failing with the following error:

===================================
"The test connection to the linked server failed."
===================================
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
------------------------------
Program Location:
at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteNonQuery(String sqlCommand, ExecutionTypes executionType)
at Microsoft.SqlServer.Management.Smo.ExecutionManager.ExecuteNonQuery(String cmd)
at Microsoft.SqlServer.Management.Smo.LinkedServer.TestConnection()
at Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.LinkedServerConnectionTest.Invoke()
===================================
TCP Provider: An existing connection was forcibly closed by the remote host.
Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
OLE DB provider "SQLNCLI" for linked server "SRVSQL2000" returned message "Communication link failure". (.Net SqlClient Data Provider)
------------------------------
Server Name: SRVSQL2005
Error Number: 10054
Severity: 16
State: 1

This is how the Security properties of this Linked server were configured:

Linked_Server_Security

In this case we are allowing the user CONTOSO\SQLADMIN to connect to the remote server using the credentials given to the local one. If any other user tries to connect to the remote server by the means of this linked connection, access will be denied (hence the “Not be made” option). Pay attention to this security dialog, it can be a little bit tricky. By "Local Login" we can specify either a SQL Server login or a Windows account however, the “Remote User” cannot be a Windows user, only a SQL Server user. Selecting the “Impersonate” option disables the “Remote User” and “Remote Password” options since we are already using the credentials given to the local server to impersonate (i.e., authenticate) the user on the remote server. The configuration on this dialog box was not responsible of the error we were hitting, but I am showing you this here to illustrate how this dialog works in SQL Server Management Studio.

The error “Login failed for user '(null)'” found is typically caused when Kerberos is not being used or when Kerberos have been configured but Delegation is not working. For this setup to work Kerberos has to be used because NTLM authentication can’t manage a double-hop scenario. We make sure that Kerberos requirements were met in our environment; a summary of these requirements can be found in the following diagram:

MSDN_Image_Double_Hop

We did followed all the steps outlined in this MSDN entry to configure Delegation but the error was still the same so we decide to capture network traces at the intermediate server while reproducing the error. After loading the traces in Network Monitor and applying a filter (use "KerberosV5" keyword as filter in Network Monitor) we found the following network traffic:

452 2010-11-16 12:22:59.334000 10.64.100.37 10.64.101.218 KRB5 TGS-REQ
453 2010-11-16 12:22:59.334000 10.64.101.218 10.64.100.37 KRB5 KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
537 2010-11-16 12:23:01.599625 10.64.112.104 10.64.100.37 DCERPC Bind: call_id: 70 IOXIDResolver V0.0
539 2010-11-16 12:23:01.599625 10.64.100.37 10.64.112.104 DCERPC Bind_ack: call_id: 70 accept max_xmit: 5840 max_recv: 5840
540 2010-11-16 12:23:01.599625 10.64.112.104 10.64.100.37 DCERPC Alter_context: call_id: 70 IOXIDResolver V0.0

The two typical reasons why you would get a KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN error, provided DNS registration was working fine and direct/reverse name resolution were returning correct information, are:

  1. Actually the SPN is not registered for the account under which the SQL Server service is running
  2. The SPN is registered for the account under which SQL is running, but it is also registered under a different account (i.e. duplicate SPNs).

We confirmed previously that the correct SPNs existed for the SQL Server service account so we then needed to find for duplicate SPNs. One way to find these duplicate entries is by using the setspn tool included in Windows Server 2008 and later, or the updated setspn version for Windows Server 2003. Personally, I prefer to use the not-so-intuitive Windows ldifde utility. Using ldifde.exe /? will show you the required syntax to output a list of SPNs for your domain. In the following example we use ldifde to provide only those SPNs that has been registered for SQL Server service:

LDIFDE -f filename.txt -t 3268 -d "DC=contoso,DC=com" -l serviceprincipalname -r (serviceprincipalname=MSSQLSvc/*) -p subtree

Here was the output for this instruction in our case:

dn: CN=SQLServer,OU=Contoso Users,OU=Managed Accounts,DC=contoso,DC=lab
changetype: add
servicePrincipalName: MSSQLSvc/SRVSQL2005.CONTOSO.lab:1433
servicePrincipalName: MSSQLSvc/SRVSQL2005.CONTOSO.lab
servicePrincipalName: MSSQLSvc/SRVSQL2005:1433
servicePrincipalName: MSSQLSvc/SRVSQL2005
servicePrincipalName: MSSQLSvc/newhost10.fabrikam.lan:1433
servicePrincipalName: MSSQLSvc/fabrikam.intranet.lan:1433
servicePrincipalName: MSSQLSvc/fabrikam.intranet.lan
servicePrincipalName: MSSQLSvc/newhost05.fabrikam.lan:1433
servicePrincipalName: MSSQLSvc/newhost05.fabrikam.lan

dn: CN=SRVSQL2005,OU=Cluster,OU=Servers,OU=Managed Computers,DC=contoso,DC=lab
changetype: add
servicePrincipalName: MSSqlSvc/SRVSQL2005.CONTOSO.lab:1433
servicePrincipalName: MSSQLSvc/SRVSQL2005:1433
servicePrincipalName: MSClusterVirtualServer/SRVSQL2005.CONTOSO.lab
servicePrincipalName: MSClusterVirtualServer/SRVSQL2005
servicePrincipalName: HOST/SRVSQL2005.CONTOSO.lab
servicePrincipalName: HOST/SRVSQL2005

As you can see the SPN required for our SQL Server 2005 server did exists not only for the SQL Server service account running on our server but also for another service account, resulting in a duplicated SPN on the network. We removed the duplicated SPN using setspn –D.

The following MSDN entry lists some of the actions that can lead to duplicate SPNs on the network:

There are several scenarios where an administrator can duplicate the service-principal-names (SPN) in the domain directory that will cause Kerberos authentication to fail. These include the following:

- Making changes to the domain account under which the instance of SQL Server runs
If SetSpn.exe is run while running an instance of SQL Server as one domain account, such as DOMAIN\User1, and then the domain account that is used to run SQL Server is changed, such as DOMAIN\User2, when SetSPN.exe is run again, it will cause the same SPN to be inserted in the directory under both accounts.

- Installing multiple instances of SQL Server that run under different accounts
If you install multiple instances of SQL Server and then run each instance under a different account, if SetSpn.exe is run on each instance, there will be duplicate accounts in the directory under each SQL Server service account. This applies for both instances that are running under a domain user and also the local System account.

- Removing and reinstalling an instance of SQL Server under a different account
If you install SQL Server under one account, register the SPNs, remove and reinstall SQL Server under a different account, and then reregister the SPNs, each domain account will have the same SPNs. This means the SPNs will be duplicated.

Bear into mind that Kerberos tickets are stored in cache for certain of time so the Kerberos ticket provided originally by the KDC although invalid, remains in cache. If we test the Linked server connection right after fixing the duplicate SPN we will still be unable to connect because of these cached tickets. The Windows Resource Kit contains the Kerbtray utility we can use to purge cached Kerberos tickets. The tool has to be executed via "run as" (from Windows GUI or Command Prompt) using the SQL Server Service account, otherwise we will be deleting the tickets for our current account and not for the SQL Server one.