SharePoint Strategery

Best used when *strategy* breaks down... (blog by Brian Pendergrass, Microsoft SharePoint - Premier Field Engineer)

SP2010: Removing/Re-joining Server to a Farm Can Break Search

SP2010: Removing/Re-joining Server to a Farm Can Break Search

Rate This
  • Comments 0

Removing or re-joining a SharePoint Server that hosts a Search component will typically break the Search Topology and lead to an inconsistency between the ServerID being referenced by the applicable Search component(s) and the SharePoint farm Configuration object for the applicable server. 

Translation you ask? Each Search component object has a reference to the SharePoint on which it runs. The component references its Server by both the name as well as the GUID assigned at the Farm level to identify each Server (e.g. the ServerID).  

For example, if I run:

Get-SPServer | SELECT Name, ID
    Name          Id
    ----          --
    FIRSTAND10    441318c0-476e-4c7e-af76-34d14b5c7067
    MADDEN2K10    06d6c8be-32b8-4a24-ae1b-4081edf6ca6b
    THEVETTE      1561c381-7fe8-4a2d-b521-0bd894c49a94

And then for the components:

$SSA.CrawlTopologies.ActiveTopology.CrawlComponents | Select ServerName, ServerId
    ServerName    ServerId
    ----------    --------
    FIRSTAND10    441318c0-476e-4c7e-af76-34d14b5c7067

You should see the same GUID listed by the SPServer object and that referenced by the Search component. The problems occur when a Server is removed from the Farm (by selecting "Remove..." via PSConfig, uninstalling SharePoint, or removing via Central Admin), at which time, the SPServer object will be removed from the Farm Configuration database. However, the Server references in each Search component object do not get updated (in fact, the components will probably still show as online in the Search topology). For example, those same PowerShell cmdlets above would now show (notice FIRSTAND10 is missing from the Get-SPServer output, but the component still references this no-longer-existing server):

Get-SPServer | SELECT Name, ID
    Name          Id
    ----          --
    MADDEN2K10    06d6c8be-32b8-4a24-ae1b-4081edf6ca6b
    THEVETTE      1561c381-7fe8-4a2d-b521-0bd894c49a94

$SSA.CrawlTopologies.ActiveTopology.CrawlComponents | Select ServerName, ServerId
    ServerName    ServerId
    ----------    --------
    FIRSTAND10    441318c0-476e-4c7e-af76-34d14b5c7067

Even if you re-join the Server back to the Farm (by running PSConfig and selecting "Join server to the farm.."), a new SPServer object will be created at the Farm level and this Server will have a new GUID (e.g. a new ServerID). Again, the references for each of the applicable Search components will not be updated with the new ServerID. For example, in this case, the same PowerShell cmdlets would show:

Get-SPServer | SELECT Name, ID
    Name          Id
    ----          --
    FIRSTAND10    200a138c-529b-4a2f-1ee7-5c706718c04b
    MADDEN2K10    06d6c8be-32b8-4a24-ae1b-4081edf6ca6b
    THEVETTE      1561c381-7fe8-4a2d-b521-0bd894c49a94

$SSA.CrawlTopologies.ActiveTopology.CrawlComponents | Select ServerName, ServerId
    ServerName    ServerId
    ----------    --------
    FIRSTAND10    441318c0-476e-4c7e-af76-34d14b5c7067

Simply removing a server from a farm does not remove the applicable Search components from the Search Topology (e.g. the components will typically report as “Online” even after its server has been removed from the farm), which underscores the fact that the Farm Topology and the Search Topology are ultimately different structures. It is because of this that the TechNet “Remove a server from a farm in SharePoint 2010” (and similar in the SP2013 page) notes the following:

“Removing a server that contains a search topology component can affect future search activities. The extent of that effect depends on the farm search topology. We recommend that you remove or relocate any search topology components from a server before removing the server from the farm.” 

The bewildering part of these ServerID inconsistencies is that it typically manifests in some downstream failure that probably doesn't make you think, "hey, I have mismatched server reference here". This behavior tends to have a very broad impact, so it can appear to manifest in different ways across various environments, but the most common symptoms include:

  • You have an SSA that seems completely beyond repair
  • Topology changes always fail, such as:
    • From PowerShell, the Topology Activation most commonly reports a message similar to the following about 20 minutes after starting the activation. (In this case, the "Object reference" is to a server that does not exist. Keep in mind that this particular error message is actually generic, so it will likely occur for any number of reasons. But if this occurs when removing a Crawl component, then you're likely encountering this): 

The Execute method of job definition Microsoft.Office.Server.Search.Administration.CrawlTopologyActivationJobDefinition 
(ID f34c9b59-0d77-41bc-af7d-024b06ecefe6) threw an exception. More information is included below.
Object reference not set to an instance of an object

    • From the UI, the Topology change may spin indefinitely
    • Or in ULS, you see messages such as:

Topology timer job has failed. Timer job name: 'CrawlTopologyCleanupJob-7f71e76d-c335-443f-a60a-a00add9d8731'.
Error message: The server FIRSTAND10 could not be found in the farm.

  • The SSA or a Search Component cannot be deleted from either the UI or PowerShell
  • A Search Component cannot be moved (implicitly, this involves a Component deletion)
  • The “Application Server Administration Service Timer Job” (job-application-server-admin-service) job reports “Unable to find server” failures in ULS
  • When upgrading SharePoint, you encounter errors that Search related DBs cannot be upgraded
  • The SSA cannot be removed from the UI 

Sure, these problems may occur for other reasons as well, so I'm not saying that this is the only cause of busted topology, hung crawls, or any other symptom above. But should you encounter a completely unexplainable problem with Search, comparing ServerIDs is something I'd check using  (PowerShell to check can be found further below). On the flip side, because the problems are so generic, the troubleshooting steps for this are useful even if root cause is not a mismatched ServerID. 

In many cases, the underlying root problem(s) can be resolved, but a single defunct component can prevent the SSA or Crawl from moving to its next expected state (e.g. having a crawl component block the crawl state moving from crawling to completing). In some cases, an Index reset may get you beyond the roadblock back to an overall healthy state (For what it's worth, the Index reset tends to work because it bypasses most of the Search Admin's normal state handling processing to truncate the Search database tables [along with the actual index files]. This tends to unblock hung crawls, but the impacted components will still be in a problematic state). Being said, it's a bit of a sledge hammer approach so best to avoid resetting the index until all other options have been attempted.

It's worth reiterating that these symptoms are not exclusive to this particular scenario and may occur for other reasons as well. In a follow on post, I'll provide additional detailed troubleshooting steps regarding some of the most common manifestations. In the meantime, to help confirm that you've encountered this, look for ULS messages such as the following occurring once a minute on the Server hosting the Search Admin component (in my case, this was my Madden2k10 server) 

03/25/2013 12:20:55.68 OWSTIMER.EXE (0x06A8)  0x034C  SharePoint Foundation
   Monitoring     nasq    Medium
   Entering monitored scope (Timer Job job-application-server-admin-service)
03/25/2013 12:20:55.68 OWSTIMER.EXE (0x06A8)  0x034C  SharePoint Server Search
   Administration dkd5    High
   synchronizing search service instance
03/25/2013 12:20:55.68 OWSTIMER.EXE (0x06A8)  0x034C  SharePoint Server Search
   Administration eff0    High
   synchronizing search data access service instance   
03/25/2013 12:20:56.78 OWSTIMER.EXE (0x06A8)  0x034C  SharePoint Server Search
   Administration fel1    High
   Unable to find server 441318c0-476e-4c7e-af76-34d14b5c7067 
03/25/2013 12:20:56.84 OWSTIMER.EXE (0x06A8)  0x034C  SharePoint Foundation
   Monitoring     b4ly    Medium 
   Leaving Monitored Scope (Timer Job job-application-server-admin-service).

From the output above and this ULS snippet, we can correlate the message "Unable to find server 441318c0-476e-4c7e-af76-34d14b5c7067" to be a reference to the ServerID of the missing FIRSTAND10 server (e.g. compare the server GUID listed here with the Get-SPServer output above).

Also, with the following PowerShell, you can perform the following check for Crawl Components…

foreach ($cc in $ssa.CrawlTopologies.ActiveTopology.CrawlComponents){
    $farmServerId = $(Get-SPServer $cc.ServerName).Id;
    if ($farmServerId -eq $null) {
       "********************************************************"
       "[" + $cc.Name + "] ServerId mismatch found";
       "    - " + $cc.ServerName + " was removed from the farm"
       "********************************************************";
       "";          
    }
    else {
        if ($cc.ServerId -ne $farmServerId) {
          "********************************************************"
          "[" + $cc.Name + "] ServerId mismatch found";
          "    - ServerId Per the Farm: " + $farmServerId
          "    - And per the Component: " + $cc.ServerId   
          "********************************************************";
          "";
        }
    } 
}

And for Query Components…

foreach ($qc in $ssa.QueryTopologies.ActiveTopology.QueryComponents){
    $farmServerId = $(Get-SPServer $qc.ServerName).Id;
    if ($farmServerId -eq $null) {
       "********************************************************"
       "[" + $qc.Name + "] ServerId mismatch found";
       "    - " + $qc.ServerName + " was removed from the farm"
       "********************************************************";
       "";          
    }
    else {
        if ($qc.ServerId -ne $farmServerId) {
          "********************************************************"
          "[" + $qc.Name + "] ServerId mismatch found";
          "    - ServerId Per the Farm: " + $farmServerId
          "    - And per the Component: " + $qc.ServerId   
          "********************************************************";
          "";
        }
    } 
}

And should you find a component impacted by a Server removal... The recommendation is to remove (or move to another server) any component with a mismatched ServerId, then optionally re-add the component(s) to the original server. In another post here, I've discussed some scenarios and tactics to overcome this.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 3 and 6 and type the answer here:
  • Post