This blog posting applies to Microsoft Office SharePoint Server 2007.

I've worked with several companies who have between 100,000 and 200,000 employees.  Naturally, they want to architect a solution for MySites that can accommodate everyone in the company.  TechNet has information in the Plan for software boundaries article, but a few numbers are ambiguous or conflicting.  In this post I'll try to clear some of that up.

The limits I'll respect from this article are:

  • 100 GB maximum size for Content Database (DB)
  • 50,000 site collections per Content DB
Scale-out

The primary scale-out strategy is to use multiple content databases in the MySite web application.  To meet the guideline of 50,000 site collections per DB, you should plan a minimum of 2 content databases to handle 100,000 MySites.

Next, consider Throughput vs. Site Collections discussed in the TechNet article.  The most striking visual impact of this graph (reproduced below) is that performance appears to drop off dramatically as the number of site collections per content database grows to 50,000.  It looks a bit alarming.  But it's important to examine the numbers.

Throughput vs Site Collections

Let's look at some points on the graph.  Note this is a sample from testing; your specific numbers may vary, but we'll assume inflection points are in roughly the same places.

  • At 10,000 site collections per content database, the number of requests per second (rps) is still above 100.
  • From about 14,000 - 16,000 site collections, throughput decreases very rapidly.
  • After 16,000 site collections, throughput "bottoms out" near 50 rps and stays pretty much the same through 50,000 site collections.

As a general approach to optimizing throughput for any arbitrary scenario, we might say that limiting content databases to 10,000 site collections is a good rule of thumb, because adding 50% more databases at this point cuts throughput in half.  But you should think through anticipated usage patterns for the particular web application and site collections.

MySites have a common usage pattern: these site collections tend to have relatively small quotas, and don't get a lot of concurrent traffic.  A relatively small percentage of MySites are used heavily; these tend to be people who enjoy publishing material or experimenting with information techniques.  In many situations, 50 rps will be sufficient for MySite traffic.  This means we're free to set the limit of site collections per content database on the MySite host at any value up to the recommended limit of 50,000.

The next limiting factor is the overall size of the content database: 100 GB is recommended as the maximum.  This, plus the desired MySite storage quota, will let us compute the optimum number of databases.

Individual MySite quota (MB) 2 5 10 20 25
Number of MySites per 100 GB content DB 51,200 20,480 10,240 5,120 4,096
# content DBs per 100,000 users 2 5 10 20 25

 

Cool Rule: Assuming 100GB as the maximum size of a content database, the number of databases you should plan for—per 100,000 users—is equal to the quota of each site collection in MB!

Other combinations of maximum content DB size or number of users will not line up this way.  To compute the number of content databases, the following formulas may be used.

  • # site collections per DB = (Max size of DB in GB * 1024) / (MySite quota in MB)
  • # content DBs = (Number of MySites) / (# site collections per DB)

For the second formula, you must always round fractions up to the next whole number (use the ceiling function in Excel).

Scale-up

To further ensure the overall performance of SharePoint when using multiple content databases, configure SQL Server so that the databases are hosted on different physical drives (spindles).  This is usually possible with SAN storage solutions also.  My instinct is that you'd want to balance this with the scale-out strategies and usage patterns.  If you had 200,000 users with MySites and a large portion really began to use MySites heavily, then you might want to use 20 content databases distributed across 5 or 10 physical drives.

Next, note that MySites usually cause significantly more transactions in the User Profile database than collaboration or publishing sites, because more web parts that present and manipulate profile information are available on MySites.  As you grow the number of MySites, the CPU, memory, storage and IOPS available for your User Profile DB becomes more important.  Therefore you want to consider putting this database on a separate physical drive also.  For very large user populations and/or heavy anticipated use of MySites, you could even consider using a separate instance of SQL Server for the Profile DB.

The whitepaper SharePoint database performance recommendations gives specific recommendations for increasing performance via the database.

What about TechNet’s limit of 150,000 site collections per web app?  Well, as of this writing, the article shows two different numbers for the same limit (the other is 50,000)... clearly, there is still some ambiguity over this.  Recent anecdotal evidence hints that 200,000 site collections can be hosted on one web application and still perform well.  If I find more details about this I'll post another entry.

Areas a performance hit might be seen due to larger numbers are those that enumerate site collections in a web app.  There are pages in Central Administration (CA) that do this, and it can be mitigated by using the "find" function in the CA page or by using stsadm commands to perform some functions such as MySite deletion.

If you want to plan for growth toward heavy MySite usage, you may want to consider using more than one web application to host them.  It's  possible to configure multiple web applications to host MySites.  When doing this, audiences can be defined to map users to MySite web apps.  A very common approach is to use geography as the audience, especially when a company has datacenters around the world that service users in their part of the world.  This has the added benefit of hosting a user's MySite from a datacenter nearer to them.  Refer to TechNet article Manage My Site host locations for much more detailed information on enabling this scenario.

However, this may not be as balanced an approach for a domestic as it is for global companies.  The ideal arrangement is to find a natural organization of people that splits the whole population into just a few categories of reasonably equal quantities, and is reflected in user profile data.  Such a perfect arrangement may not exist; review your user profile database to look for natural categories that can uniquely classify everyone.

Technorati Tags: ,