To continue the discussion of Crawl Performance, we will look at some additional counters that will help with further analysis of state of the crawler.

Here we will expand on the counters we have reviewed before and see what information can be collected from them.

   

 

What are the counters trying to convey?

In the Search Gatherer object some additional counters offer a glimpse into the bigger picture. We will also look at few counters in the Search Gatherer Projects performance object. This particular performance object is associated with the individual content index projects that you have created. For example, Portal_Content and Non_Portal_Content. For the purposes of this discussion I have chosen Portal_Content.

  • Search Gatherer\Performance Level
    • This is a very important indicator of the number of resources that can/will be consumed by the crawler. The range of values is between 1 and 5, and the default value is 3.
    • This number impacts the number of single-threaded and multi-threaded daemons (mssdmn.exe) that will be started. If the number is set to 1 or 2 then the number of daemons that will be started is reduced greatly. This way the crawling does not have a huge impact on your server. Another important thing to point out here is that this number also has an effect on the thread priority of the filtering threads. When a setting of 1-3 is used the filtering threads are started with a below normal thread priority. This means that the filtering threads are not getting as many cycles as you may want or desire.
    • It can also have an impact on the multi-threaded daemon by adjusting the number of threads that are available for calling IFilters. As the Performance Level goes down the fewer the threads available for calling IFilters.
  • Search Gatherer\Server Objects 
    • This counter reflects the number of servers that the gatherer is accessing at one time during a crawl. What this means is that when a server is accessed for the first time a server object is created. A server object would be created for each unique host server that you are configured to crawl. For example, if you were configured to crawl http://mysharepoint and http://www.microsoft.com, you would have at least 2 server objects created. More would be created if there were links to other servers.
    • That server object holds many pieces of information about the server that is be contacted. One of the things that this object holds is the number of concurrent connections that are allowed. In a later post we will discuss the Site Hit Frequency rule and how that impacts this connection limit.
  • Search Gatherer\Threads Accessing Network
    • This counter shows the number of available Filtering threads that are currently accessing the network. When a filtering thread receives the next item to crawl that thread is responsible for loading the protocol handler responsible for connecting to the target. If the target of the item to be crawled is on a file share, then we will load the protocol handler that is responsible for connecting to file servers.
    • Depending on the item type, it is possible that the item will be copied from the target machine to the local machine in the farm, specifically the indexer that is requesting the item.
    • The item will be stored in the temporary directory on the indexer machine. This directory is typically located at <drive>:\Program Files\SharePoint Portal Server\Data\temp. This is configured in the Central Admin service within the Manage Server Settings-> Search Server Settings page. The File Locations section allows you to set the temporary path. It is VERY important that you do not have anti-virus scanning enabled for this directory. Doing so will only serve to slow down the overall crawl. You need to ensure that you don't have any infected files contained within SharePoint but this can be done on upload and using a virus product that can integrate with SharePoint. Many of these are available and you should find one that fits your needs.
    • During the time that the item is being copied from the target, this counter is incremented. When the copy has completed and we are no longer hitting the network this counter is decremented.
    • After the item is on the local drive in the temp directory, this thread will load the IFilter and begin calling the GetChunk method to extract the contents of the item. This work is performed by the daemon and as the data is fed back to the MSSearch process for final processing, this counter is incremented as well. So what this means is that as long as data is coming back from the daemon and being fed back upstream to the MSSearch process, this counter will reflect that activity. So it in essence will show that the thread is still accessing the network when it is really not impacting the overall network. This is due to the design of the counter.
  • Search Gatherer\Threads In Plug-ins 
    • This counter shows the number of threads that are in Plug-ins at any one time. The designers of Search made two very cool choices, 1) make the daemons sacrificial to account for misbehaving IFilters and 2) make MSSearch extensible by designing plug-ins to handle various functions.
    • One of these plug-ins is the Subscription Plug-in (SUBPI). When data is fed back upstream from the daemon certain actions must be taken to complete the overall work for the gatherer. One of these functions is to create SPS alerts, not WSS alerts as these are done with a very different mechanism. That is the function of this plug-in.
    • Too much time spent in this counter may indicate a slow connection to the back-end SQL server. It is expected that this counter will increment/decrement regularly during a crawl.
  • Search Gatherer Projects\Crawls in Progress
    • This counter indicates that a crawl is in progress. It does not tell you what type of crawl it is but it at least clues you into the fact that there is a crawl running.
    • When looking at perform data from a historical perspective, you are able to determine the start and stop times of the crawl by using this counter.
    • Many times when customers call in they know that a crawl is running but they don't know which index is crawling because they have so many of them. This counter helps us to focus in on which specific one is running.
    • I use this counter to restrict the view in the perform capture when looking at historical information. For example when I have 3-4 crawls in one perfmon log, I can move the start and end time bars in Perfmon to include just the data for a specific crawl.
  • Search Gatherer Projects\Incremental Crawls 
    • This counter indicates that the crawl that is currently running for this Project (Index) is an Incremental crawl. When the counter is 1 then it is an incremental when it is a 0 then it is a full crawl or perhaps an adaptive crawl. I have not seen many people run the adaptive crawl so you won't see much of that here.
    • Again this counter is used to identify the crawl in question when looking at historical information. It is also a good sanity check when looking at live data to determine if it is a incremental crawl or full crawl that is currently running, if you don't already know.

In a later post we will discuss additional counters and what they indicate.