Hi all, this topic is an area that has caused me much pain and work.  My goal for this was to follow the recommended SQL guidelines while minimizing the impact that these maintenance jobs have on Crawling and Queries.  We know from the SQL Monitoring an I/O post that Search is extremely I/O intensive .  As it turns out so is all of the regular maintenance that SQL recommends, so finding the right balance between the two is an interesting scheduling task.

As a starting point much information about SQL maintenance and MOSS is covered in the following paper:

There are some key areas from the above paper that I would like to augment here.

  1. The stored procedure (proc_DefragIndexes) identified in this paper will work, but it is extremely expensive to run on the Search DB as it defrags all of the indexes in the table.
  2. Maintenance plans generated with the Maintenance Plan Wizard in SQL Server 2005 can cause unexpected results (KB 932744.)  While this was fixed in SQL 2005 SP2 these maintenance plans also do more work than is necessary to have a healthy functional system.   
  3. Shrinking  the Search DB  should not be a necessary task that you need to perform.  The process of Shrinking the database does not provide a performance benefit.  SQL best practices for DBCC SHRINKFILE suggest that this operation is most effective after an operation that creates lots of unused space.  Search does not regularly perform these types of operations.  The only time that a SHRINKFILE may make sense is after you have cleaned out your index by removing a Content Source.     
  4. Rebuilding an index can cause latency issues with SQL Mirroring if the SQL I/O subsystem is constrained.  If you are using SQL Mirroring, be sure you are following the SQL best practices and the SharePoint mirroring white paper.  Because Search, SQL Mirroring, and defrag are all very I/O intensive you will want to be extra cautious with your deployment plan for this defrag script and make sure you test the script prior to going into production.

DBCC CHECKDB

DBCC CHECKDB is a command used to check the logical and physical integrity of all the objects in a database.  SQL Best practices recommend that you run DBCC CHECKDB periodically.  For a Search deployment we would recommend that you run DBCC CHECKDB WITH PHYSICAL_ONLY on a regular basis.  The PHYSICAL_ONLY option will reduce the overhead of the command.  However, due to the cost of running this you should schedule it during off-peak times.  The frequency of execution depends on your business needs, but a good place to start is once a week just prior to your back-up.  You still need to run DBCC CHECKDB, but less frequently also based on business needs.  Perhaps every other or every third back-up.  

When running these commands make sure that you have a monitoring process in-place.  DBCC only reports errors, it does not fix them unless explicitly specified by other options.  You either want to archive the output of the DBCC command for post processing or make sure you have event log monitoring set-up (for example MOM) to check for DBCC errors.

In very large environments you can run DBCC on an off-line (sandbox) copy of the database.  This will be less intrusive to end-users and the crawl.  In this scenario you would restore your back-up to a separate sandbox and run DBCC CHECKDB in the restored  environment.        

Fragmentation and index statistics freshness

We started with the proc_DefragIndexes script mentioned above.  After running it became obvious that the script was just too expensive to run on a regular basis.  To reduce the load placed on the I/O system we took a look at all of our indexes in the Search DB and defragged them one-by-one to measuring query performance along the way.  Doing this we were able to identify the indices that provided a performance benefit to the system when they were defragmented.  These indexes are listed below:

  • IX_MSSDocProps
  • IX_MSSDocSdids
  • IX_AlertDocHistory
  • IX_MSSDEFINITIONS_DOCID
  • IX_MSSDEFINITIONS_TERM
  • PK_Sdid
  • IX_SDHash
  • IX_DOCID

Optionally there are two additional indexes that you may want to include in your defrag maintenance plan.  These indexes do not see much use in typical out of box situations and are commented out in the script.  But if your environment is built on a custom UI or makes extensive use of the Advanced Search UI you will see improvements in query latencies if you defrag them.

  • IX_int -- defrag this index if you have a lot of queries that using numeric properties in the property store.  The classic case is date rage queries.
  • IX_Str -- defrag this index if you have a lot of queries that using string properties in the property store.  There is not a common case for this but if you have made changes to your managed properties and are driving your search UI off of exact matches for a string based property you will want to regularly defrag this index.

Once we knew which indexes to defrag we looked at the duration it took for the index to reach a 10% defragmentation rate.  From this we adjusted the FILLFACTOR so we could maintain a longer period of time between actually needing a defrag.  At this point we are seeing a duration somewhere around 2+ weeks between defrags.  Do note that by increasing the FILLFACTOR we did grow the size of the database slightly, the growth rate on SearchBeta was not that large.

We then looked at the cost/benefit of doing a Reorganize versus a Rebuild.  This was a interesting discovery for us.  Initially we had a script in place similar to proc_DefragIndexes that would choose to Reorganize or Rebuild based on percent fragmentation with 30% being the decision point (IE greater than 30% would do a Rebuild).  What we found was a Reorganize was taking over 8 hours with a 10% fragmentation rate and during this time end-user queries suffered dramatically.  Out of curiosity and desperation we tried a Rebuild which is supposed to be the more expensive of the two operations.  The Rebuild operation is completing in approximately 1 hour while the Reorganize takes as long as 8 hours.  The Rebuild operation is more expensive in the sense that you will see some failed queries during the hour that it runs, where as the Reorganize doesn't have as drastic of an effect on the queries, but the overall cost is much higher since you have an 8 hour window where the query performance is degraded.  UPDATE STATISTICS:  In the experiments we ran we found that simply doing the rebuild (which also updates statistics) that it was not necessary to regularly use this command.

Finally we deployed the script into an environment that utilized SQL Mirroring.  Unfortunately this didn't work out very well.  The mirror got so far behind that we eventually had to disconnect the mirror and stop the defrag.  Going through an analysis of this it became clear that the root cause was that the environment was heavily I/O bound and the defrag script generated more I/O than the system could keep up with.   While the mirror was behind end-user query latencies suffered dramatically.  To recover from this we ultimately had to improve the hardware by increasing the number of spindles. 

To mitigate this we have added a parameter to the script that allows you to reduce the MAXDOP used in the index rebuild.  Setting this parameter to 1 on a SQL box that is minimally I/O bound helps, but it may not be enough depending on how constrained the system is.  If you are in an environment  that is I/O bound (with or without SQL Mirroring) we strongly recommend that you go through a test of the defrag before you go live with the deployment.  The easiest thing to try is the following SQL statement:

ALTER INDEX IX_MSSDocProps ON [dbo].[MSSDocProps]

REBUILD WITH (MAXDOP = 1, FILLFACTOR = 80, ONLINE = OFF)

The statement above rebuilds the largest index using the lowest possible MAXDOP, this index must be rebuilt OFFLINE so you will need to run this on a test system or during a maintenance window.   While this command is running keep an eye on the state of your mirroring with:

  • The duration of the command.  Will it complete within your service window?  For comparison purposes this command completes in under an hour on the SearchBeta hardware
  • SQL I/O latencies
  • If you have mirroring in place
    • The Database Mirroring Monitor
    • Send and Redo Queues  within perfmon.  The monitor above will tell you if mirroring is too far out of sync, but these counters are useful for comparison if you start changing the MAXDOP parameter.

Bottom line we feel the rebuild is a much better operation to run and recommend that you:

  1. Run the script on a regular basis; once a night or on the weekends depending on your service windows.
    • Weekends or weekly - reduce the fragmentation rate (sproc parameter) to 5.0 or lower to prevent missing the defrag due to a fraction of a percent (IE - 9.5%)
    • Nightly - use the defaults for fragmentation rate. The largest index (MSSDocProps) gets rebuilt approximately every 2 weeks on SearchBeta. Running the script nightly will ensure that your indexes are up to date more often, but gives you less control over the exact time that the index rebuild occurs.
  2. Before running the script the first time test out how your system will behave when rebuilding MSSDocProps.
  3. Reduce MAXDOP - If your environment shows poor I/O response time or unacceptable durations (cannot complete a defrag inside your service window) reducing the MAXDOP value may reduce the duration of the script and put less pressure on the I/O system.  Reducing the MAXDOP will not help enough if the system is very I/O bound. 
  4. SQL Mirroring - SQL mirroring is sensitive to I/O latencies, adding the defrag may be too much I/O for the system handle.
  5. Poor I/O latency - You should focus on improving the I/O subsystem of your SQL environment before you begin running this script.    

Stored Procedure syntax:

exec proc_DefragSearchIndexes [MAXDOP value], 
[fragmentation percent]
  • MAXDOP value - Integer value. Default is 0  which means that all available CPUs will be used.
  • Fragmentation percent - decimal value. Default is 10.0.  This value was explicitly chosen because we able measure query latency improvements on SearchBeta when defragging at the 10% boundary.  

-Thanks

Dan Blood
Senior Test  Engineer
Microsoft Corp