Hello again, Dan Blood here. To layout the lessons I've learned in hosting Search 2010 I will provide you with a full picture of what the hardware behind SearchBeta looks like. Be aware that the hardware is a little underpowered for the count of items in the index. As such you should not take the hardware I have listed below verbatim and implement your solution on top of it, rather use this hardware and lessons as a starting point. Coupled with the capacity planning document you can then tailor your hardware to your needs and business specific SLA requirements.
The starting point for the hardware behind SearchBeta was to provide a search experience over roughly 60 million items and keep these items freshly crawled within a 24 hour time window. Over time the corpora has grown to include ~72 million items. With 3 Crawl databases the system is almost able to meet a 4 hour freshness target for the majority of the 72 million items.
The ~72 million items are broken out across the following content sources:
The query load of the system is not extreme with peaks of 120 queries per minute. Given the query load and the desire to reduce costs I have implemented query redundancy with an active/passive scheme on each of the 6 query servers. Meaning that each server is hosting 2 active partitions and 2 passive partitions providing full redundancy. This configuration is typical for 60 million items, however, with ~72 million items we are out of capacity. We recommend having enough memory to fit 33% of the index in RAM, with a combined active index size of 106GB we are only able to fit 30% of the index in memory. Because of this 95% of our queries are at 1.1 second during a 24 period, to reach sub-second latencies we would need to meet the 33% guideline.
Hardware & Topology
The hardware for SearchBeta is a 10 box services farm with the following machines:
There are a few main areas that I would change for SearchBeta hardware if I were to purchase all of the hardware again from scratch:
Query Server machine specs
We initially started with 4 query servers and grew to 6. As a result we have query servers with different clock speeds. This is discouraged as the slower machine will degrade the overall query latency because every query must be executed across each unique index partition.
6 Machines
Property Database SQL machine specs
1 Machine
MD1120 24-10k 148GB spindles
Data
Raid Type
Spindles
Space Used
Reserved / Used
Property database
Raid 1+0
12
405GB / 232GB
Property database log
Raid 1
2
9GB
Admin database
4
12GB / 12GB
Admin log
6GB
Temp database & log
*8 data files
114MB / 52MB
Crawler machine specs
2 Machines
Crawl Database SQL machine specs
3 - MD1000 45-15k 450GB spindles
Raid Controller
Crawl database 1
6
116GB / 80GB
Crawl database 2
1
157GB / 56GB
Crawl database 3
70GB / 56GB
Crawl database log 1
3
54GB
Crawl database log 2
95GB
Crawl database log 3
33GB
6.7GB / 5.7GB
In the coming posts I will dig further into Crawl and Query sides of the system as well as how SQL is utilized. Providing even further details about how to monitor the running system and what areas to look at to see if the system is reaching capacity.
Dan Blood
Senior Test Engineer
Microsoft Corp