Use a Dedicated Web Front End for Crawling
In SPS 2003 we found that indexing could be a pretty heavy weight operation. In a small/medium environment it's not a big deal. In an enterprise environment crawling hundreds of GB to TBs indexing can potentially cause more traffic than user traffic. The solution ended up being... setup a web front end where the index server can index against that one web front end thus not impacting normal user traffic (at least from a web front end server). We use to call this the "Target" box. A dedicated web front end for crawling. This was accomplished by adding the IP and host name of the farm in the hosts file of the index server. There was one bright customer that found that having this "Target" box AS the Index and got amazing performance on the indexing since it didn't have to make the requests to another box to then make requests to SQL. (By the way, this topology (to have the index be a WFE) in SPS 2003 is a not supported topology and the topology page will complain.
In MOSS 2007 the product supports both the Index/WFE server role in a topology AND the dedicated web front end for crawling. What you need to know is how the product supports this scenario... When you select this option (Central Admin > Operations > Services on Server> (MOSS) Search configuration) to setup this special role, the Index server adds a host file for all of the hostnames for all web apps and the IP address of the WFE. Everything's cool right? Well, just make sure it's the correct IP, if your server only has one IP, everything's cool. If it's not the correct IP, you'll have communication problems, failed indexing, etc... Even if you try to correct these entries, they will be overwritten. The solution? Make sure the first NIC is the IP you're using, the alternate solution, if you have multiple NICs or alternate DIPs (dedicated static IP addresses) for each site, then simply turn off this radio option which will remove the static entries in the hosts file and maintain the list yourself. It's not rocket science.
So in a nutshell... If you have a 3/4 server farm or 5+, you can configure a dedicated WFE for crawling, in a sense, imagine how much more efficient it would be to simply have your index server be a web front end as well. By saying that this server, the Index/WFE is then it can hit the web service locally to index the content thus reducing unnecessary network traffic. In a WAN environment, you'll need to put in host files for the other servers that are "WFEs dedicated for crawling" to reduce unnecessary indexing load on your user emphasized web front ends. With enhancements in the product it is now easier to setup this scenario by chosing that the server is a dedicated web front end for crawling i.e. target box. By the way, you can use a robots.txt file in the root of the web app on servers not to be used for indexing to prevent indexing thus alleviating any misconfigurations or rogue crawlers. You really have 2 options, either manage the hosts file yourself to configure the index to crawl itself, or select the "use a dedicated web front end for crawling" which will add entries to your hosts file for all web apps. I recommend verifying the IP of those automated entries to prevent long hours of troubleshooting wondering why you lost connection with the rest of the farm and why sites now won't render.
Cool. Enjoy. The Target server scenario is a very cool one when executed well. It can really decrease crawl times and allow you to run more threads without impacting end user traffic (well slightly since it's simply still traffic against SQL, but keeps more of the traffic off the network.
Let me give Daniel Webster credit for finding that the logic for adding these hosts files doesn't work all the time. It's also a topic that hasn't recevied the posts that it deserves. No longer a backend network hack, this "target" server concept is now a real considered/tested scenario and one of my first recommendations for scaling an environment with more than 500GB. Why have an Index only server?
<update 2/8/07>
I was thinking about this more and had a few threads on this topic. One question... why make a WFE an Index box? I think you need to look at it the other way around. It's basically making your index server have a WFE role as well so it reduces one of the hops. This particular WFE you won't want to put in load balancing rotation.
Another thought... It also makes a good Central Admin, SSP Admin box. You can extend the SSP and Central Admin on more than 1 box. Even if you provision these web apps on your other WFEs, by configuring the IIS app pool setting to shut down when idle... will reduce memory usage and reduce number of worker processes.
Dedicated Index... (2 hops)
Index -> WFE -> SQL
Index/WFE (1 hop)
Index/WFE -> SQL
</update>