Tianxiang Chen's Tech Blog

Readability counts.

WCF on intranet with windows authentication: Kerberos or NTLM (Part 1)

WCF on intranet with windows authentication: Kerberos or NTLM (Part 1)

Rate This
  • Comments 3

The issue

When we build enterprise level SOA system on top of windows servers, if the environment is with Active Directory, using windows authentication is probably the most appropriate authentication mechanism which is secure, straight forward to build and easy to maintain. Underneath WCF's windows authentication implementation, two SSP are used: Kerberos and NTLM. You might encounter the same issues like I did, here I want to share some of the experience come from my trouble shooting.

If we use domain user to host a WCF service, and call this service from another machine, very likely we will have this error:

A call to SSPI failed, see inner exception. ---> System.ComponentModel.Win32Exception: The target principal name is incorrect

There are quite a lot articles (1, 2, 3, 4) discussing this issue, basically their solution is to:

  1. use setspn.exe to create an SPN for the domain account
  2. configure at client side, set spn as client identity

However, the odd thing is actually we can skip step 1, and set a dummy string in step 2, it also works. Why? I am not the first one who has this question, these 2 posts (1, 2) has the exact same question against this. So, I decided to dig deeper and find the root cause.

Analysis

From the result of the workaround, we can 'feel' that the key point is which protocol is used in authentication, NTLM or Kerberos, they make the difference. This KB tells us how WCF choose the protocol, as below (left column is client / top row is server):

 

Local User

Local System

Domain User

Domain Machine

Local User

NTLM

NTLM

NTLM

NTLM

Local System

Anonymous NTLM

Anonymous NTLM

Anonymous NTLM

Anonymous NTLM

Domain User

NTLM

NTLM

Kerberos

Kerberos

Domain Machine

NTLM

NTLM

Kerberos

Kerberos

In our case, we are using domain user and domain machine (network service cross machine), they should all use Kerberos as 1st choice.

How do we know which protocol is finally picked? There are two ways:

  1. Look at the Security Event Log, you can filter like I do:
    image
    View the 'Logon' event, it can tell us Kerberos or NTLM:
    image
    image
  2. Disable NTLM at client side and call service.
    We can configure this at code level:
       1: channelFactory = new ChannelFactory<TChannel>(binding);
       2: channelFactory.Credentials.Windows.AllowNtlm = allowNTLM;
    Or in config file:
       1: <behaviors>
       2:       <endpointBehaviors>
       3:         <behavior name="WcfTestBehavior">
       4:           <clientCredentials>
       5:             <windows allowNtlm="false" />
       6:           </clientCredentials>
       7:         </behavior>
       8:       </endpointBehaviors>
       9:     </behaviors>
      10:   </system.serviceModel>
    If NTLM is picked and we disable that, we will get this exception:
    System.ServiceModel.Security.SecurityNegotiationException:The remote server did not satisfy the mutual authentication requirement.

Now we are ready to do the experiment to see the logic inside WCF.

As I said before, this "A call to SSPI failed" exception only occurs when service is run under domain account, and client is from another machine. So If we put the client to the same machine of the server side, it can succeed, that's because it is using NTLM. This behavior is expected from the information in the protocol choosing strategy table.

If we set the dummy string as the SPN in client side, we can see it actually use the NTLM to authenticate. Why it cannot fallback to NTLM if we don't specify the spn?

I tried to dig out the reason via reflector, however unfortunately I failed to get 100% correct answer because I found myself not able to understand the fancy WCF design, too many factories, base classes, reflections, etc, just dizzy微笑

Per my experiment, my assumption is: if identity is not specified on client side, it would automatically create identity using the hostname in the Uri, suppose we are calling net.tcp://remotemachine1:port/MyService, the WCF client will use the machine name as spn, that is: EndpointIdentity.CreateSpnIdentity("remotemachine1") to call service.
The tricky logic is: if the SPN specified in client is valid, WCF would use this to do Kerberos auth. If SPN is not a correct one, it would pick NTLM. That's why it is working if we specify any dummy string as SPN, it is actually fallback to NTLM because the SPN is invalid.
This is not cool and correct to me, because the logic is not like Negotiate, described in this document:

Your application should not access the NTLM security package directly; instead, it should use the Negotiate security package. Negotiate allows your application to take advantage of more advanced security protocols if they are supported by the systems involved in the authentication. Currently, the Negotiate security package selects between Kerberos and NTLM. Negotiate selects Kerberos unless it cannot be used by one of the systems involved in the authentication.

The WCF's fallback logic is crappy, waste thousands of poor .net developer huge amount of time to investigate the failure and workaround. If it can use this logic: "try Kerberos, if fail, try NTLM", I bet very few people would even see this ugly SSPI exception.

To make it clear, I drew a diagram to illustrate the logic (I don't guarantee 100% of correctness):

image

Solutions

Now we know the fact that we are actually using NTLM to workaround the issue in the most cases.

If you don't care Kerberos and NTLM,  you can use String.Empty or null to create SPN identity at client side, you always go to NTLM.

What if we want to stick to Kerberos, what should we do?

We got original exception "A call to SSPI failed" because the remote service account (domain account) has no access to the machine SPN key, but if we use "Network service" or "Local system" to run the server side, they have the access so the Kerberos auth can be done.

The arbitrary SPN's syntax is <ServiceClass>/<ServiceName>, the machine SPN = HOST/MACHINENAME, we can use setspn.exe –L machinename$ to see the set of SPNs that machine account has.

So, can we create a SPN for the service account, using the HOST/MACHINENAME, so that client side would need no change?

setspn.exe –U –A HOST/SERVERNAME DOMAIN\SERVICEACCOUNT

Unfortunately the answer is NO. By doing this command, it would cause duplicate SPN record, and Kerberos would always fail.

To check duplicate SPNs, use setspn.exe –X

We have to use a different name to create the SPN for the user, for example: MySystem/Service1.

setspn.exe –U –A MySystem/Service1 DOMAIN\SERVICEACCOUNT

Then we come back to client side, set the identity:

  1. in config file, set inside the endpoint node:
       1: <endpoint name="winservicenettcp"
       2:                 binding="netTcpBinding"
       3:                 bindingConfiguration="netTcp"
       4:                 address="net.tcp://myserver:12345/WcfPerfTest"
       5:                 contract="Contract.IPerfTest"
       6:                 behaviorConfiguration="WcfTestBehavior">
       7:         <identity>
       8:           <servicePrincipalName value="MySystem/Service1"/>          
       9:         </identity>
      10:       </endpoint>
  2. Or in code level, set in EndpointAdress:
       1: new EndpointAddress(new Uri(ServiceUrl), EndpointIdentity.CreateSpnIdentity("MySystem/Service1"));

If we didn't create SPN successfully, it could still pass because it can fallback to NTLM, so we can disable NTLM at client side.

Another solution is use UPN, in config it is userPrincipalName and in code it is EndpointIdentity.CreateUpnIdentity. Just use the service account's upn (mydomain\accountname) as input, and it can also succeed.

Another useful tool in debugging Kerberos is klist.exe. It can show use the current cached KerbTicket in the system, make sure you run klist.exe purge after we change something, or the cached ticket would give us some bogus result.

Conclusion

Because of the weird design of WCF windows authentication, it is very challenging to write a generic client wrapper because you cannot tell how to set identity (not set, use correct spn, use correct upn or use dummy spn) just from the Uri. Personally I would say the behavior is a bug, if it is by design, it must be a very bad design微笑 If it can change to "Kerberos first, if fail, try NTLM", that would be fantastic, although seems not very possible ^_^

Comments
  • Great article!

  • Fixed my problem and learned something new today. Thank you!

  • Excellent! Helped a lot. Where is part 2?

Page 1 of 1 (3 items)
Leave a Comment
  • Please add 1 and 1 and type the answer here:
  • Post
Search Blogs
Archive
Archives