After having a bit of a struggle with configuring crawling for anonymous SharePoint site I decided to put a couple of points here in my blog.

Some sources recommend that if you need to crawl anonymous SharePoint site, you should create a content source with type 'Web Site'. This is not really the best way because you will loose the intelligent incremental crawl. 'Web Site'-type can have incremental crawl scheduled but it will still crawl through the whole site creating unnecessary load. If you use 'Web Site'-type you also need to create crawl rules to include the content you want to be crawled.

What you should do is create content sources with type 'SharePoint Sites'. This way crawling will happen using SharePoint's web services and incremental crawl has a better understanding what has changed by investigating timestamp information from the content. Also you don't have to create any crawl rules to make the crawling work. You can create them if you need to for example to exclude something from being crawled. Authentication configurations are a bit different for 'SharePoint Sites'-typed content sources, and if these configurations are wrong, they can easily lead to the infamous 'Access denied'-errors in crawl logs. Here are the basic requirements for the authentication configurations when using 'SharePoint Sites'-typed content sources:

Enable Windows Authentication for SharePoint web application in anonymous zone

First of all anonymous SharePoint-site should have Windows Authentication enabled. Make sure that you enabled it for the web application in CA:

  1. Open SharePoint 2010 Central Administration
  2. Click Manage web applications under Application Management-section
  3. Select your web application's row
  4. Click Authentication Providers from ribbon
  5. Click Internet-zone (or what ever is defined for your anonymous site)
  6. Make sure that Enable Windows Authentication and Integrated Windows authentication are checked
  7. Click OK

Enable Windows Authentication for anonymous IIS web site

Second thing you need to check is the configuration of you anonymous web site under IIS. You need to check that Windows Authentication is enabled for your IIS web site. You have to perform this task to all crawl servers.

For IIS 7:

  1. Click Start -> Administrative Tools -> Internet Information Services (IIS) Manager
  2. Click Sites
  3. Select your anonymous site
  4. Double-click Authentication under IIS-section in the Features View
  5. Select Windows Authentication and click Enable from the Actions-pane on the right side of the window

Give full read access to default content access account

You should have a dedicated domain account that is used as a default content access account. This is important since this account is used by the crawler in any of the servers in your farm that has been assigned as crawl servers.

Default content access account needs to have Full Read-permissions to the SharePoint web application you are crawling.

  1. Open SharePoint 2010 Central Administration
  2. Click Manage web applications under Application Management-section
  3. Select your web application's row
  4. Click User Policy from the ribbon
  5. Click Add Users
  6. Click Next
  7. Input your default content access account to Users-box
  8. Select Full Read under Permissions
  9. Click Finish

Set default content access account in Search Administration

You need to define a default content access account in the Search Administration.

  1. Open SharePoint 2010 Central Administration
  2. Click Manage service applications under Application Management-section
  3. Click Search Service Application to open Search Administration
  4. In System Status-panel you will see Default content access account. Click on the right side of that row to define an suitable account
  5. Input your default content access account and password and click OK

That's it! Start a full crawl for your content source and you should see anonymous items crawled successfully.