As mentioned in my previous post, the first step in capacity management is modeling the SharePoint 2010 environment.
Modeling the environment is basically analyzing and estimating the usage of the environment in terms of workload and dataset. Workload describes the usage of the environment in terms of total number of users, concurrent users, requests per second, usage distribution, etc. Dataset describes the volume of content stored in the system and how it can be distributed.
Workload information that is required to have a model of the new environment is present at http://technet.microsoft.com/en-us/library/ff758645.aspx and they are as follows
Average daily RPS
Average RPS at peak time
Total number of unique users per day
Average daily concurrent users
Peak concurrent users at peak time
Total number of requests per day
Expected workload distribution
No. of Requests per day
Web Browser - Search Crawl
Web Browser - General Collaboration Interaction
Web Browser - Social Interaction
Web Browser - General Interaction
Web Browser - Office Web Apps
Outlook RSS Sync
Outlook Social Connector
Other interactions(Custom Applications/Web services)
Dataset information required to have a model for the data sizes needed by the system are present at http://technet.microsoft.com/en-us/library/ff758645.aspx and they are as follows
DB size (in GB)
Number of Content DBs
Number of site collections
Number of web apps
Number of sites
Search index size (# of items)
Number of docs
Number of lists
Average size of sites
Largest site size
Number of user profiles
Generally, the hard task is to come up with values to put in these tables. Usually, business users are the ones best equipped to fill these tables; however, they don’t have the technical knowledge to do so. Thus, the technical capacity planner must work with the business users to build up a model of the environment based on the speculated needs.
If SharePoint 2010 is the first version of SharePoint that the customer is using, then filling the tables above will be like shooting in the dark as there is very little historical data to use for estimation. Thus, it is important to come up with logical estimates for usage of the new environment and monitor usage patterns for the environment and consider scaling up/out when needed.
Note: Underestimating the usage of the new environment can result in a smaller farm size and reduced performance. On the other hand, overestimating will result in a larger farm that will not be fully utilized.Thus, it is important to choose an approach either to overestimate or underestimate based on expected usage of the environment.
Estimating the environment usage is easier if there is historical data to use as a basis for further estimation. Historical data could come from a SharePoint 2007 environment that will be upgrade or another present SharePoint 2010 environment with comparable usage patterns.
Thus, extracting usage data from a current system is crucial to building a correct model for the new environment. One preferred method to do that is to use LogParser for retrieving this information. The most important advantage of LogParser is its flexibility as it allows you to simply write queries against the IIS logs and get the results in the format you prefer.
The next blog post will be about LogParser. It will include the URL to download it, the query language options, and the most common queries that you will need to use to extract the tables needed above.