Chris Gideon's WebLog

  • Where did it go...

  • This blog is moving to SharePoint!

    This week I will be moving my blog over to SharePoint and I will post the URL shortly when it goes live. I know I haven't been blogging much lately but if you're one of my customers you know why. J We've been busy upgrading to MOSS or WSSv3.

    When I move to my new digs I will start posting my upgrade learning's as I go. Until then I want to pass on a very important post for SPS 2003 users upgrading to MOSS.

    Boxin Li a developer in the SharePoint product team has described the process very well. Enjoy!

    How to upgrade a custom area definition

  • Understanding and Troubleshooting SharePoint Explorer View Whitepaper released

    Steve Sheppard has released an excellent white paper on SharePoint Explorer View. You can find it here:

    http://www.microsoft.com/downloads/details.aspx?FamilyId=C523AC7A-5724-48BE-B973-641E805588F4&displaylang=en

  • Central Admin and Kerberos

    I recently came across a problem with trying to use Kerberos for Central Administration Virtual Server/Web Application/IIS Web Site. In short here is the problem:

    Symptoms: Unable to get to Central Admin with 401.2

    Troubleshooting: Machine name = SpServer, App Pool Identity= SPServiceAcct, Domain= Contoso

    • Used ADSUTIL to check the NTAuthenticationProviders metabase key for the Central Admin WebApp. It was set to "Negotiate, NTLM"
    • Ran SetSPN –L on the Application Pool Identity SpServiceAcct (NT account). This revealed that we had http/SPServer:<centraladminport> and http/SPServer.Contoso.com:<centraladminport>.
    • Ran SetSPN –L on the SPServer account. This revealed that we had the expected HOST/SPServer and HOST/SPServer.Contoso.com records.
    • Checked the Event Viewer for errors. Kerberos was reporting a duplicate registration.
      • Deleted SPN for http/SPServer:<centraladminport> and http/SPServer.Contoso.com:<centraladminport> from the Contoso\SPServer account.

    Root Cause: Registration of SPN for Central Admin caused a duplicate. Integrated authentication failed to fall back to NTLM since our target service had a valid registered Kerberos entry point.

     

    Why did this happen? To paraphrase a KB on ASP.NET…

    If multiple Web sites are reached by the same URL but on different ports, Kerberos will not work. To make this work, you must use different hostnames and different SPNs. When Internet Explorer requests either http://www.Contoso.com or http://www.Contoso.com:81, Internet Explorer requests a ticket for SPN HTTP/www.contoso.com. Internet Explorer doesn't add the port or the virtual server/Web Application to the SPN request. This behavior is the same for http://www.contoso.com/app1 or http://www.contoso.com/app2. In this scenario, Internet Explorer will request a ticket for SPN http://www.Contoso.com from the Key Distribution Center (KDC). Each SPN can be declared only for one identity. Therefore, you would also receive a KRB_DUPLICATE_SPN error message if you try to declare this SPN for each identity.

     

    There are built-in SPNs that are registered for computer accounts. These SPNs are recognized for computer accounts if the computer has a HOST SPN. The Troubleshooting Kerberos Delegation white paper, Table 1   Built-in SPNs Recognized for Computer Accounts explains this very well. By adding the registration on the App Pool Identity account it created a duplicate because by default we would map HOST/SPServer to HTTP/SPServer regardless of port.

    Ok, back to WSSv3/MOSS upgrade…

     

  • Focus on WSSv3/MOSS upgrade…keeping Kerberos in mind

    This is my first post in awhile because I have been intently learning and experimenting with WSSv3 and MOSS 2007. I will be focusing on upgrade for awhile. In fact, I am going to put a series of short posts on getting ready for WSSv3/MOSS.

    First quick note: If you have followed my earlier advice and are running with Kerberos instead of NTLM for WSSv2/SPS then this is important. Make sure if you choose to perform a gradual upgrade that you have registered your servicePrincipalNames (SPN) for the new URL you will be using during the redirect prior to running setup. Why? It is necessary because clients ask for a ticket based on the URL of the server (or load balancer). Once you start redirecting for the gradual upgrade your tickets will be invalid for the v2 environment.

    For example, if your URL is http://SharePoint.Contoso.com for the pre upgrade URL and you choose http://SharePointOld.Contoso.com , then you will need two SPNs. One for v2 redirects URL which will now be http://SharePointOld.Contoso.com and another for v3 which is http://SharePoint.Contoso.com. It is recommended that you use the same Application Pool Identity (account) in v3 that you used in v2. Therefore, you will end up with both SPNs registered on the same account. To sum up:

    App Pool Identity= Contoso\SpPoolAcct

    Domain= Contoso

    Pre upgrade URL for v2= http://SharePoint.Contoso.com

    Post upgrade URL for v2 = http://SharePointOld.Contoso.com

    Post upgrade URL for v3 = http://SharePoint.Contoso.com

    Setspn command to register the new SPN for upgrade= setspn –A http/SharePointOld.Contoso.com contoso\SpPoolAcct

  • New Best Practices: Using Disposable WSS Objects document released

    This is a document that I have been eagerly awaiting. If you are doing SharePoint OM development you definitely need to check this out.

    Best Practices: Using Disposable Windows SharePoint Services Objects

    Although its posted under the V3 SDK, the methods are just as valuable for V2 Development. If someone were to create a series of code snippets for these techniques…

  • #50070: Unable to connect to the database <Database Name>

    During my trek through Windows SharePoint Services I frequently hear about this error message. I like to refer to “Unable to connect to the database” as one of the dreaded errors of SharePoint. Why would I do this? Because in the majority of cases this is not a WSS problem. It comes from a variety of environmental factors which are difficult to diagnose and troubleshoot. Rather than get into a philosophical debate on the how or why of WSS error reporting in V2 it’s better to just list the causes that I know about.

    1. The Ad Hoc Query Plan bug. Most easily identified with SpSitemanager, first fixed and explained here.
    2. Speed & Duplex Settings NOT Specified for both the NIC and switch.
    3. NIC Drivers misconfigured or a bad NIC driver (no naming names).
    4. A damaged NIC or very rarely a bad/damaged network cable.
    5. Using one VLAN on the switch and then putting both the Internal NIC and the Public NIC on the same subnet.
    6. Unicast mode for WLBS without following the steps in the whitepaper or article. This results in the switch perceiving WLBS as port flooding.
    7. Multicast mode with WLBS without setting up a static ARP entry in the router/switch.
    8. Setting more than one Default gateway on a multi-homed server when both NIC’s are on the same subnet.
    9. Hardware based Load balancing not configured correctly. For an example on configuring this correctly see this.
    10. SynAttackProtect turned on after installing SP1 for Windows Server 2003. This is explained in the SQL Server 2005 release notes but affects SQL 2000 as well.
    11. Not using Aliases when accessing SQL on a port other than 1433.
    12. The SharePoint application pool account locked out.
    13. SQL Server Paused.
    14. A poorly architected web part behaving badly.
    15. High CPU on SQL from another application sharing the same server.
    16. Blocking on SQL server the result of backup software or another application.
    17. Anti-Virus gone awry. Corrected with SP2 for WSS.
    18. App Pool Recycling under load.
    19. Incorrect permissions in SQL server for the SharePoint App Pool account (Need Security Admin and Database Creators).
    20. NTLM Bottleneck see my previous posts for details.

    I am certain there are other possible causes. The point is that many of these items are beyond the control of SharePoint. However, I am very pleased with the work the WSS Dev team has done with Beta 2 of V3. Many of the items listed above are caught and reported. I encourage you to try it for yourself.

  • NTLM Authentication with SharePoint Part 2

    In my last post I laid out the basic flow of NTLM authentication with SharePoint when all the accounts (user, service and machine) reside in the same domain. In this post I will discuss the implications of multiple domains in two different scenarios.

    Scenario 1:

    Active Directory Forest=Farbrikam.com; Domain for users= CHILD.Fabrikam.com; SharePoint WFE, SQL DB have machine accounts in Fabrikam.com; SharePoint Application Pool and SQL Service accounts are in Fabrikam.com.

    In this scenario the secure channel DC servicing SharePoint has to contact its peer DC in the CHILD domain via the trust. By default the MaxConcurrentApi for a Domain controller over a trust is one. That’s right one concurrent request (one user at a time) will be processed over the trust. That’s why adjusting the MaxConcurrentApi on the DC’s servicing SharePoint (or any other high volume application, ISA comes to mind) is important. Again profile and test don’t just jump to ten.

    Scenario 2:

    Active Directory Forest=Farbrikam.com; Domain for users= CHILD.Fabrikam.com; SharePoint WFE, SQL DB have machine accounts in Fabrikam.com; SharePoint Application Pool and SQL Service accounts are in GrandChild.Fabrikam.com.

    In this scenario you have the same need to walk the trust for users but you also have a new need to walk the trust for the service accounts.

    These two scenarios require another item to consider under high volumes of authentication, Secure Channel “float”. There are a handful of reasons as to why secure channel resets to a different DC. The first is a response greater than or equal to 45 seconds. This is usually the result of a secure channel being established over a slow link or a Secure Channel to a DC that is overloaded (high CPU). Second, there is a network failure to get to the secure channel DC. This can be caused by a physical network failure; Spanning Tree running on the switch which is outlined here; a hiccup from auto negotiate (determining the speed and duplex settings) at the NIC to the switch outlined here; or the Secure Channel DC being rebooted. Once the secure channel is unbound from a DC it goes through the DC Locator process to find a DC. If you have multiple geographical sites in your environment it is important to designate Active Directory Sites to keep your SharePoint servers using local DC’s. Under a high load the last thing you want is your Secure Channel DC being over a slow WAN link and this can happen if you don’t architect this into your design. This can also happen if you place DC/GC over slow links for the domains you are authenticating. For example, in Scenario 1 if the DC/GC for the CHILD domain is over a slow link a bottleneck will be possible. The better design would have DC/GCs for the Fabrikam.com and CHILD domains close (high speed links) to the SharePoint servers and an Active Directory Site specified to keep Secure Channels local if the DC Locator process is called.

    To sum up my recommendations for best performance:

    1. Consider creating an Active Directory site just for the SharePoint boxes (if in the same forest) and add GC’s for each domain going against SharePoint.
    2. Make certain that the DC/GC’s are physically as close (high speed links) as possible to the SharePoint boxes.
    3. If possible make all DC’s GC’s if in Native Mode.
    4. Hard set NIC’s and Switches Speed and duplex settings to avoid loss of connecting during auto negotiate.
    5. Check with your switch vendor on the settings for spanning tree to avoid Secure Channel drops. Most vendors have an option to keep this from happening while still benefiting from Spanning Tree.
    6. Increase MaxConcurrentApi and profile DC/GC (for domains in play) with SPA to see if they can handle the load. Make certain to do this on the SharePoint servers and DC/GC for all domains in play.
    7. Monitor Secure Channels with NLTest.exe after patches that cause a reboot to ensure that secure channels don’t float to slow link DC/GCs.
    8. For extreme performance consider the use of x64 DC/GCs. See the impressive results here.

    To see a good explanation as to the troubleshooting process check out SPAT’s blog post on the subject.

    Why am I taking the time to point this out with regard to SharePoint specifically? Because slow NTLM authentication is one of the leading causes of the dreaded Cannot connect to the configuration/site database and this is rarely considered in troubleshooting this error (problem). It is also a factor in slow portal search crawls because of the number of Group Membership evaluations that are required for Security Trimming.

  • NTLM Authentication with SharePoint

    NTLM Authentication in SharePoint

    Most SharePoint environments today are using NTLM (the default) as the authentication protocol. This has some hidden performance implications that are not widely known. But before we dive into the world of NTLM Performance bottlenecks with SharePoint lets address the NTLM authentication process first.

    Order of actions for inbound user authentication for all accounts in the same domain (roughly)

    1. Internet Explorer (IE) first connects to SharePoint (HTTP Get /) and presents anonymous. Three way TCP handshake here.
    2. SharePoint (IIS-401) rejects anonymous and returns the WWW-Authenticate: NTLM header.
    3. IE calls AcquireCredentialsHandle and passes the appropriate Security Support Provider (SSP) in this case NTLM. IE prompts (depending on the zone the site lives in and options configured) for username and password.
    4. IE calls InitializeSecurityContext which constructs the Auth token containing the Domain Name and Machine name.
    5. SharePoint (IIS) is already listening for connections and parses the inbound request.
    6. AcceptSecurityContext is called by IIS and an Auth Token containing an NTLM challenge (16 –byte random number) is sent to the client (401).
    7. IE parses the reply and InitializeSecurityContext is called again, an Auth Token containing the NTLMChallengeResponse (challenge is encrypted with a hash of user’s password) is sent to SharePoint.
    8. SharePoint (IIS) parses the request. At this point the Secure Channel (SC) Domain Controller (DC) is contacted with the user name, challenge sent to the client, and response received from the client. This is where MaxConcurrentApi comes in to play more on that in a bit.
    9. The DC takes the user name and retrieves the password hash (used to encrypt in step 7) and compares it to the challenge response. If they are identical, authentication is successful.
    10. The DC returns success to the SharePoint (IIS).
    11. SharePoint (IIS) accepts the connection as authenticated.
    12. What happens here will be another post.

    MaxConcurrentApi is set by default to a value of 0 which equals two concurrent NTLM authentications. Under load SharePoint can end up waiting on authentication. By adjusting the MaxConcurrentApi value on the SharePoint Web Front End (WFE) you can increase the amount of concurrent authentication and thereby increase WFE performance. This does however increase the load on your DC’s and that’s why I recommend you slowly increase the value while profiling your DC’s with Server Performance Advisor. This way you can avoid overloading the DCs.

    If your users are in a domain that is in Native mode a Global catalog (GC) server is required. You can disable this requirement with the IgnoreGCFailures registry key. However, it’s redundant with the group membership caching in Windows Server 2003(and unless you are dealing with a branch office deployment it’s unnecessary). That said, it’s important to point out that if you are in Native mode (users domain) and your secure channel DC is not a GC then the authentication will be proxied to a GC over the GC Channel. Therefore it is a best practice to make sure a GC is placed close (high speed link) to the SharePoint servers.

    It is important to note that this is not a SharePoint bottleneck, it’s an NTLM bottleneck. In my next posts I will cover how multiple domains affect this process.


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker