As mentioned in my previous post, the first step in capacity management is modeling the SharePoint 2010 environment.

Modeling the environment is basically analyzing and estimating the usage of the environment in terms of workload and dataset. Workload describes the usage of the environment in terms of total number of users, concurrent users, requests per second, usage distribution, etc. Dataset describes the volume of content stored in the system and how it can be distributed.

Workload information that is required to have a model of the new environment is present at http://technet.microsoft.com/en-us/library/ff758645.aspx and they are as follows

Characteristics

Value

Average daily RPS

Average RPS at peak time

Total  number of unique users per day

Average  daily concurrent users

Peak  concurrent users at peak time

Total  number of requests per day

Expected  workload distribution

No. of
  Requests per day

%

Web  Browser - Search Crawl

Web  Browser - General Collaboration Interaction

Web  Browser - Social Interaction

Web  Browser - General Interaction

Web  Browser - Office Web Apps

Office  Clients

OneNote  Client

SharePoint  Workspace

Outlook  RSS Sync

Outlook  Social Connector

Other  interactions(Custom Applications/Web services)

 

Dataset information required to have a model for the data sizes needed by the system are present at http://technet.microsoft.com/en-us/library/ff758645.aspx and they are as follows

Object

Value

DB size  (in GB)

Number  of Content DBs

Number  of site collections

Number  of web apps

Number  of sites

Search  index size (# of items)

Number  of docs

Number  of lists

Average  size of sites

Largest  site size

Number of user profiles

 

Generally, the hard task is to come up with values to put in these tables. Usually, business users are the ones best equipped to fill these tables; however, they don’t have the technical knowledge to do so. Thus, the technical capacity planner must work with the business users to build up a model of the environment based on the speculated needs.

If SharePoint 2010 is the first version of SharePoint that the customer is using, then filling the tables above will be like shooting in the dark as there is very little historical data to use for estimation. Thus, it is important to come up with logical estimates for usage of the new environment and monitor usage patterns for the environment and consider scaling up/out when needed.

Note: Underestimating the usage of the new environment can result in a smaller farm size and reduced performance. On the other hand, overestimating will result in a larger farm that will not be fully utilized.
Thus, it is important to choose an approach either to overestimate or underestimate based on expected usage of the environment.

Estimating the environment usage is easier if there is historical data to use as a basis for further estimation. Historical data could come from a SharePoint 2007 environment that will be upgrade or another present SharePoint 2010 environment with comparable usage patterns.

Thus, extracting usage data from a current system is crucial to building a correct model for the new environment. One preferred method to do that is to use LogParser for retrieving this information. The most important advantage of LogParser is its flexibility as it allows you to simply write queries against the IIS logs and get the results in the format you prefer.

The next blog post will be about LogParser. It will include the URL to download it, the query language options, and the most common queries that you will need to use to extract the tables needed above.