There is a lot of information out there on working with large lists in SharePoint 2007.  There is some good info here and here and Steve Peschka has a great whitepaper on working with large lists in SharePoint 2007 that has been out for a while now.  One thing that I haven’t seen a lot of information on is crawling large lists.  Steve’s whitepaper has a quick reference in his whitepaper saying “If the indexer is timing out when crawling large lists, you can increase the time-out value”.  We had been experiencing issues crawling a large pages library of over 4,000 items.  We were getting the error “The item may be too large or corrupt.”  Increasing the time out  seemed to help, but many times the indexer would still not make it through the whole library, we were still looking for a better solution.  After working with Microsoft Support we were told to modify three registry values:

Key Default Value New Value
HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\DedicatedFilterProcessMemoryQuota 104,857,600 209,715,200
HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FilterProcessMemoryQuota 104,857,600 209,715,200
HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FolderHighPriority 50 500


Modifying these registry values has definitely seemed to do the trick for us so far.  This was for x64 machines with a lot of RAM, so this might not be a good idea if you are still on x86 and are RAM constrained.  I’d like to hear your results, did this work for you?  What are your experiences crawling large pages libraries in SharePoint 2007?