Now that we have discussed some of the main counters. Let's take a closer look to see how to use this information to improve our crawl performance.

Resource Usage/Performance Level

The value seen in the Search Gatherer\Performance Level directly correlates back to the Resource Usage setting found in the Central Administration pages for Search. This selector ranges from Background - Dedicated and it shows up in this counter with the range of 1-5. The default is the center 'dot' in this setting, which equates to 3.

This setting configures the gatherer to better use the resources you have made available. The specific resources I am referring to are the memory, CPU, etc. How "beefy" is the machine. The bigger it is the higher you will want to make this setting. It is possible however that if you are running on a less than optimal box that you will want to reduce this number as not to punish your index server too much. That is the fun of tuning.

Assuming that you have a large "beefy" machine you will want to move the Resource usage to the 4th selection at a minimum. The biggest change that occurs here is that you allow all of the filtering threads to be started at a normal thread priority rather than below normal thread priority. Moving to the 5th selection or dedicated does not increase this priority setting any more.

By increasing the Resource usage to a 4 or 5 will also serve to increase the maximum number of Filter threads that are allowed on the indexer machine. A setting of 4 will allow for up to 48 filtering threads to be created on the machine and a setting of 5 will max out the machine at 64 possible filtering threads. Having more threads potentially means that more items will be crawled at one time but does not guarantee that you will always use 48 or 64 threads at all times. More on this later in a later posting.

Document, Document, Document...

Before making any of these changes you will need to document your current crawl and how it is running so that when you make changes you can determine what impact these changes have in your environment. There is no silver bullet to performance tuning with the gatherer but if you take methodical steps you can increase it to get the greatest performance in your environment.

When I start working on a machine trying to tune it I work to document many things in the environment and then I track what I have changed and how it effects the overall crawl performance and time to run.

The things that I collect include the following

  • Resource Usage/Performance Level
  • Site Hit Frequency Rules
  • Timeout settings - Connection
  • Timeout settings - Request Ack
  • Current crawl time from the gatherer log summary (Site Settings->Configure Search and Indexing->Manage Content Indexes use View gatherer log menu option on the specific index that you are working with)
  • Documents in Index (Site Settings->Configure Search and Indexing->Manage Content Indexes)
  • Content Index size (Site Settings->Configure Search and Indexing->Manage Content Indexes)
  • # of documents marked for Retry (Site Settings->Configure Search and Indexing->Manage Content Indexes use View gatherer log menu option on the specific index that you are working with)
  • Unique errors in the gatherer log summary and # of each occurrence (Site Settings->Configure Search and Indexing->Manage Content Indexes use View gatherer log menu option on the specific index that you are working with)

I like to document each of these things so that I know if a change will increase or decrease the time it takes to crawl, or changes the number of errors, etc.

In a later post we will discuss items for use with tuning.