Hello again and welcome to the SharePoint Server 2010 Search series of blog postings dedicated to detailing the lessons I have learned in the process of hosting our dogfood system.  In this posting I will be talk about the high level concepts utilized to optimize query performance.  For reference and a complete description of the hardware please refer to the SearchBeta Hardware post to see the hardware specs and topology layout.

   

Index partitions, query components & mirrors

The first item that one needs to understand when building out a SharePoint 2010 Search service is how the index is split into sub pieces to provide redundancy and sub second query latencies.  The following are the building blocks used:

  • Index Partitions - An index partition is a logical fractional piece of the entire index.  Documents are added in a round-robin fashion across components serving each partition to ensure that each partition is equivalent in size.  Each user query must span all partitions that make up the full index.
  • Query Components - The modules that satisfy a query.  Each partition must have a component to provide end-user queries over it.  The SharePoint 2010 Search system does not allow a partition to exist without having an associated component.
  • Mirrors - A redundant copy of a component; there can be more than one mirror for any partition.  Mirrors can be active (FailoverOnly == false) or passive.  Active mirrors address throughput (if your bottleneck is CPU in the query component) as end user queries will be sent to one of the active mirrors using a round robin algorithm.  Passive mirrors address availability, as a passive mirror will serve queries if no other active components exists for the given partition.  (See below for more details).

 

So the main way to improve query latency is to reduce the size of each partition by adding more of them.  With more partitions you add more parallelization for the query and reduce the size of the index that each query component must process. 

 

If you need to improve query throughput you do so by simply adding more mirrors.  Thus increasing the number of components that can concurrently satisfy incoming queries.  

     

Determining the need for more partitions

The above is a simplistic look at how to improve latency and throughput.  There are several additional details that you should use to determine the need to add more components into the system.  The capacity planning document identifies three main guidelines to direct the number of partitions you should have:

  • 33% of the active partitions contained on a given machine should fit into the available memory on the machine
  • Each partition is recommended to be smaller than 10 million documents
  • Active components will consume 2 cores; one for servicing queries and one for merging the index. Passive components only  consume 1 core, for merging the index.  

 

Determining the need for more mirrors

There are two reasons to add mirrors, redundancy and increased query throughput.  Adding the first set of mirrors (passive or active) will allow one or more query servers to be taken off-line with limited impact on end user queries.  Taking a look at the SearchBeta Hardware the farm is fully redundant allowing me to patch half of the query servers simultaneously while still servicing end user queries.  Building redundancy in the fashion that SearchBeta has with each machine containing 2 active components and 2 passive components (all serving different partitions) and a second machine that is an exact copy with the active/passive components partitions flipped is the implementation that most people will use.  This provides the best cost versus benefit since it allows for good query performance during normal operations and still provides service during failure cases.  But when a machine is taken out of service the mirrored machine is out of capacity due to memory constraints.  Queries don't fail in this case, but latencies do suffer as the passive component starts servicing queries it must first read its portion of the  index into memory.  Also the common deployment will only have enough memory to cache the active components, so in the failed over state the machine will page the index in and require more memory to be performant.    

 

While adding more partitions to a given index increases the degree of parallelization for a single query; adding more mirrors enables more concurrent queries to be executed.  This assumes that the mirrors are marked as active (FailoverOnly == false) and the mirror has 2 dedicated cores.  For comparison the query load on SearchBeta is 120 queries per minute.  This load causes limited CPU usage and we are not close to needing more throughput capacity.  I suspect this will be the common case and the addition of multiple active query components serving the same partition will be rare.

 

Proof of concept

As you can see sizing a query topology relies heavily on knowing ahead of time how big your index is going to be so you can cache 33% of it in memory.  To figure the size of your index you can either estimate the size with the formula in the capacity planning document (Size of Content DB(s) indexed * 0.035) or you can crawl a representative portion of your content and building an estimating formula specific to your data.  Generally the 0.035 ratio is assumed to be correct.  But your mileage may vary based specifically on the type of content you have.  If your content is rife with images and sparse with text then the Content DB will be quite a bit larger than average while the resulting index will be smaller than average.  The inverse, content that is verbose with prose, may also cause inaccuracies in the formula.  To create your own ratio you will need to calculate the ratio of the index (all files within the cifiles directory on the query server after a Master Merge) to the overall size of your content database (i.e. Index Size/Content Size.)

 

If you are building out a large scaled environment in the range of 40 million or more documents I would strongly encourage you to look at implementing some type of proof of concept (POC) phase. In this process you should build out a smaller SharePoint 2010 Search farm to index approximately 10 million documents that are representative of your corpora.  With the POC you can compare your content to index ratios to confirm how closely your data conforms to average formula we have published. 

  

Thanks for your time, in the next post I plan to delve into scaling of property store and touch on some of the advanced topics for tuning the system for optimal query latencies

 

Dan Blood

Senior Test Engineer

Microsoft Corp