SharePoint Indexing Basics
The SharePoint search architecture is organized and maintained within a content index that is used to provide searching. SPS provides four options for maintaining content indexes as shown in Table 1. It is important for developers and administrators to understand each of these indexing options and how they impact the maintenance and performance of an SPS environment. In this article, I will discuss each of these types and provide some guidelines of when each should be used.
Table 1: Types of indexing available
· Full Update – Crawls all content on a site.
· Incremental Update – Crawls only new and changed content.
· Incremental (inclusive) Update – Crawls new and changed content.
· Adaptive Update – Crawls content that is likely to have changed based on the site history.
A full update can be performed on either a content index or a specific content source. By definition a content index is a full text index that enables searching of content and retrieving document level properties. A content source is used to populate content indexes with information stored in a particular location. During a full update, SPS updates all content located within the content source or index. This update will add, change or delete site content within the index. Of all the indexing options the full update is the most system intensive of all the content updates.
Typically, a full update is performed in the following situations.
· The rename of an area
· After a reset of the content index
· The renaming of a file
· Anytime the rules are changed for a content source
· Anytime a noise word is changed
Full updates are also the most inclusive of all the indexing types. For example, only the full update actually picks up changes made to local groups that affect the underlying content. This is important to remember and one of the reasons why it isn’t always recommended to use local groups to secure SPS content.
An incremental update of a content source is designed to include only the changed and new content. By design it will ignore unchanged content within the index. This means by default an incremental update is faster than a full update. Typically, an incremental update is performed when content has changed but it is not necessary to perform a full update. Typically, a periodic incremental update is used to populate the content index without using the system resources or time needed for a full update. As a general rule, unless there is a specific reason to perform a full update, administrators should always try to perform incremental updates. A good site design will typically, run an incremental update once per day for non-portal site content and once every ten minutes for portal site content.
Note Many times in order to optimize system resource SPS is configured to perform an incremental update daily and a full update weekly. This allows site users to have daily updates of changed content and periodic full updates of all content.
Incremental updates will also remove content that has been excluded using the rules settings of the content index. For example, if you have a content source that indexes several different document types - .doc, .htm, and .ppt. First, a full update is performed to populate the underlying content index. Then using the rules setting the underlying content source is modified to exclude files with a .ppt extension. Once the next incremental update is performed this change is then incorporated into the content index.
Note: If the first content update that occurs on a new content index is an incremental update, SPS actually performs a full update. Once the content index is populated subsequent incremental updates then revert to true incremental updates.
SPS 2003 has added several indexing optimizations designed to improve performance from the previous version. These optimizations are designed to enable a lower system load and faster execution. For example, in order to achieve additional efficiency the incremental update will not delete removed portal content from the index, but will index both Web Part pages and SPS application pages.
Incremental (Inclusive) Update
SPS 2003 introduces another type of index update called the incremental (inclusive) update. Similar to the incremental update except that it indexes only Web Part pages and application pages. The incremental (inclusive) update is designed to detect deleted entries in Windows SharePoint Services (WSS) document libraries and lists. The incremental update (inclusive) requires the least system resources when used against WSS sites.
An adaptive update, like the incremental update, indexes content that has changed since the previous update. Unlike the incremental update the adaptive update increases its efficiency by attempting to access only those documents determined likely to have changed based on an analysis of historical site information. One limitation of the adaptive update is that it can be performed only on a content index, but not a content source.
The adaptive update uses the accumulated historical information from all previous index updates. The efficiency of the system increases over time and multiple updates as more statistical samples are made available to the algorithm. Usually, after a week of daily adaptive updates, the system settles into steady state. In this state the system has acquired enough information to allow the adaptive updates to function at optimal efficiency.
The updates compute statistical information regardless of the type of update SharePoint Portal Server 2003 performs. You can perform incremental updates and then later switch to adaptive updates. Performance improves immediately because the system is already in a steady state. This means that SharePoint Portal Server has already accumulated enough statistical information to apply the algorithm. As a general rule, the use of an adaptive update is unlikely to give a significant performance improvement in collections of less than 2,500 documents.
Note An adaptive update is faster than an incremental or full update, but an adaptive update could miss some updated content. However, SharePoint Portal Server always indexes documents that haven’t changed within two weeks, so no changes would go unnoticed for longer than that.
Performance improvement between an adaptive update and an incremental update is entirely dependent on the number of documents and the frequency they change. Generally, the more documents that change frequently the more system load required.
If you haven’t performed any other types of index updates, the first time you perform an adaptive update is equal to performing a full update. The second time the update is run an adaptive update is equal to performing an incremental update. The adaptive update performance gain actually occurs on the third index.
Indexing and content searching are an important part of SPS. It is essential for developers and administrators to understand the impact that each of these has on the content index. The information presented in this article is designed to show the indexing basics. It is important to continually validate your indexing strategy for your SPS site to maintain optimum system performance.