From time to time, I get asked about how to determine why the SPS 2003 crawler/gatherer take a long time to crawl data and build the index. There are many reasons for this but it takes some digging to really determine the exact cause. In this series I will endeavor to explain steps to isolate and improve the performance of the gatherer.
Hot Fixes
We need to start with the low hanging fruit first. We need to ensure that your farm is in a good state prior to performance tuning the gatherer. The first part of this is to ensure that you are running a current hotfix level for SPS and WSS 2003. While these hotfixes will not fix all problems they will help the overall health of the gatherer in your farm. I recommend that you at least apply the following hotfix packages:
Registry Key Changes
There are some registry key changes that you may want to include as a part of the environmental configuration. These are not required but they make it easier for you to review the data. If you choose to perform these registry key changes you will need to do them on the server(s) that is/are acting as the Index server(s) in the SPS Farm. Since we are talking about registry keys, it is important to reiterate the words from Microsoft regarding updates to the registry.
Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.
Setting up Perfmon
When troubleshooting performance problems with a crawl, it is essential that you collect the right performance counters on the Index server(s). Here is a list of counters that you should collect for these types of problems.
If you are not familiar with the (*)\* and the \* designations they mean the following
These counters are specific to SharePoint Portal Server 2003. These counters do not include any perf counters for SQL. The sample interval for these counters will depend on how long it takes to reproduce the problem. If it is a long running crawl you might want to set it to 15 secs per sample. It could be argued higher or lower but this should give you a reasonable picture of what is going on at the time of the crawl.
In a later post we will discuss the counters and what they indicate.