In broad terms, SharePoint Search is comprised of three main functional process components:
In this post, we’ll create a reference baseline by defining key concepts and terminology. In a following post, we’ll take a step-by-step look at the Crawling process, which is leveraged in both SharePoint Search as well as FAST Search for SharePoint (FS4SP). Finally, we’ll then dig further into the Indexing process for SharePoint Search by reviewing protocol handlers, iFilters, and Search Plug-ins used in building the search index.
Avoiding the double-speak
Unfortunately, like much of SharePoint, the nomenclature for Search may vary depending on your background. For example, in SharePoint 2007 (primarily MOSS, but also WSS), the server with the Indexer Role performed the crawling processes, so you may see other references using ‘Indexer’ and ‘Crawler’ interchangeably.
The term ‘Indexer’ is also muddied because the SP2007 Indexer server holds the master copy of the full-text index, whereas SP2010, the master index resides as partitions across the query components. With this, SP2010 decouples the Crawl Component from the master copy of the index, which can make ‘Indexer’ a misleading term to describe a Crawler.
Another point of confusion comes from the SP2007 option to define Crawling Servers for the Office SharePoint Search Service (e.g. the Web Front End And Crawling section). This option specifies the web front ends to-be-crawled, but these WFEs perform no direct role in the crawling process – just indirectly as web server(s) that fulfill HTTP GET requests from the Crawler …err… the Indexer in SP2007-speak.
Concepts, Terminology, and Working Definitions
For this series (as well as all future posts unless otherwise noted), I’ll only refer to Indexing when discussing the specific process of building the index. For the Crawling process, I’ll emphasize the gathering aspects or interchangeably use Gathering. Otherwise, any reference to a Crawl (e.g. a full or incremental crawl) generally refers to the overall crawling process, which implies the full pipeline with both crawling and indexing.
To further consistency, consider the following my Rosetta stone of sorts (and I’ll try to note other points of cross-over along the way). Also, I’ve included links where possible (many of which are to Office 12 documentation, but are relevant to SharePoint 2010 nonetheless).
Update (11/2/2012) I've noticed that a lot of the content linked below has bee "permanently removed from the website". I'm working internally to find if these are/were replaced and will try to get these updated as soon as possible.
This post intended to describe each of the puzzle pieces involved with SharePoint Search. In coming posts, I plan to further show how these pieces fit together – first by explaining the crawling process and then by looking into the Indexing process for SharePoint Search.
Update (10-29-2012): This post was written prior to the release of SP2013. In hindsight, I should have held off on starting this in order to include SP2013, but as is, the content is most applicable to SP2010.