Following on from the blog that Martin Kearn posted , I wanted to expand on some of the mysteries of the communication that SharePoint uses for enterprise search. While we were putting together the material for the TechEd talk, this was by and large the most interesting communication section to work on.

Almost all administration communication within SharePoint is conducted over web services (HTTP/HTTPs traffic). By and large, Enterprise Search is the same, with the unsurprising exception of the search index propagation, and the rather surprising exception of search queries.

Note: Search in this article refers the Microsoft Office SharePoint Search service, which is distinct from the Windows SharePoint Services search service.

Administration

Administration of the search service takes place over the Search Administration web service. The service is located in an IIS Web site called “Office Server Web Services” on each server that is part of a SharePoint farm. The site holds entries for each service such as Search or Excel Services, for each Shared Service Provider present on the farm.

The web site is configured by default to run on port 56737, or 56738 if SSL is being used. This can be changed with the STSADM command:

stsadm -o setsspport –httpport <HTTP port number> -httpsport <HTTPS port number>

The Search administration web service is specified in the file SearchAdmin.asmx. The full path to the search admin web service is therefore (for http traffic):

http://<FQDN>:56737/<SharedServiceProviderName>/Search/SearchAdmin.asmx

The administration service provides all the methods necessary to control the Search service, such as starting content source index crawls, updating scopes, etc. The web service is available to be called by custom applications as well as by the system.

 

Crawling

The protocols that are used during search crawling depend on the content source that is being crawled. Which protocol is used for crawling sources is handled by a Protocol Handler, an object responsible for fetching the content to be indexed. By default, SharePoint comes with protocol handlers for the following protocols (from the msdn article Plan to crawl content (Search Server 2008) (http://technet.microsoft.com/en-us/library/cc280343.aspx) :

Protocol handler

Used to crawl

File

File shares

http

Web sites

https

Web sites over Secure Sockets Layer (SSL)

Notes

Lotus Notes databases

Rb

Exchange public folders

Rbs

Exchange public folders over SSL

Sps

People profiles from Windows SharePoint Services 2.0 server farms

Sps3

People profile crawls of Windows SharePoint Services 3.0 server farms only

Sps3s

People profile crawls from Windows SharePoint Services 3.0 server farms only over SSL

Spsimport

People profile import

Spss

People profile import from Windows SharePoint Services 2.0 server farms over SSL

Sts

Windows SharePoint Services 3.0 root URLs (internal protocol)

Sts2

Windows SharePoint Services 2.0 sites

Sts2s

Windows SharePoint Services 2.0 sites over SSL

Sts3

Windows SharePoint Services 3.0 sites

Sts3s

Windows SharePoint Services 3.0 sites over SSL

Custom Protocol handlers can be written to fetch content from disparate sources. For more information, refer to Creating a Protocol Handler (http://msdn.microsoft.com/en-us/library/ms947581.aspx) . Each protocol handler is free to use whichever communication protocol it wishes to. For accessing external data, searching of Business Data Catalog information is in most cases the preferred solution. For more details refer to Enabling Business Data Search (http://msdn.microsoft.com/en-us/library/ms492695.aspx).

 

Index Propagation

Indexing and querying both make use of the Server Message Block (SMB) protocol to transfer data.

The SMB protocol was originally invented at IBM with the intention of rendering network file access available with the same ease as local file access. Around 1990, Microsoft merged the protocol with the LanManager product, and continued to develop it as a means for sharing files and folders, printers and miscellaneous other communication.

The SMB protocol was originally intended to run over NetBIOS, but from Windows 2000 was modified to run over TCP port 445, which it currently uses. With Windows Vista, Microsoft released SMB 2.0, which has several enhancements over the original protocol.

Given that SMB was designed for file and folder sharing, it comes as no surprise that the index propagation is done over SMB, and consists of partial file copies to the search index shared folder location.

This is a shared folder created on each Query server in a SharePoint farm, and although configurable when the Search Query role is activated on a server, is usually configured as \\<servername>\searchindexpropagation. By default, this location usually shares the folder at C:\Program Files\Microsoft Office Servers\12.0\Data\Applications\<shared service provider GUID>.

Search propagation is a co-ordinated effort between the Search Service on the Index server, the Search Service on the Query server, the database, and the file system, using the SMB protocol. The following diagram, taken from the public document describing the Search Index Propagation protocol [MS-CIPROP]: Index Propagation Protocol Specification (http://msdn.microsoft.com/en-us/library/cc313077.aspx), describes the interaction.

 

In this diagram, the top-right block refers to the SMB propagation of index files, which takes place as a standard file share copy.

 

Search Querying

Perhaps the biggest surprise is that search queries are issued from the Web Front-End (WFE) to the Search Query server using the SMB protocol. It would seem that this is a prime candidate for a web service query, and the fact that SMB is used has implications for extranet server topology design.

For example, if you design a SharePoint infrastructure architecture where the WFE’s are located in a separate segment of the perimeter network, and the rest of the servers in the farm are located within a more secure segment of the network (a form of the Back to back perimeter topology ), the SMB protocol will need to be opened in the firewall between the two network segments.

In the above diagram, Router A will need SQL Server ports and SMB ports to be allowed through. This means essentially that file-share access is enabled through Router A.

So why would the Search service use SMB?

The answer is performance – it turns out that SMB is used as the transport-level protocol for the Named Pipes . Named Pipes is a Microsoft Inter-Process Communication (IPC) mechanism which is binary and fast. For a long time it was the de facto communication mechanism in and across Windows Servers. For a long time it was the default communication mechanism for SQL Server, and is still available as a protocol for the server product. By using SMB as the transport layer, Microsoft provided Named Pipes as an IPC that was fast and efficient.

Perhaps some clue can be gathered in the Win32 API - to open an IO device, the CreateFile method is called. This API call is responsible for opening files, directories, physical volumes – as well as IO devices such as tape writers, parallel ports, and pipes.

 

Summary

Inter-server communication is something which almost always turns out to be slightly more complex than it first seems, and this is absolutely the case for Enterprise Search within MOSS. Enterprise search involves several processes and communication mechanisms. This has impact on all aspects of server farm design and maintenance, and is crucial to understand when troubleshooting search problems.

The first port of call to understand all of this should be the SharePoint Back-end protocol documents (http://msdn.microsoft.com/en-us/library/cc339473.aspx), which detail each of the processes and interactions, as well as communication mechanisms.

 

Peter Reid

SharePoint Consultant

Microsoft Consulting Services UK

Peter.Reid@Microsoft.com

Click here to see my bio