To ensure the availability and reliability of your SharePoint Server 2010 environment you must actively monitor the physical platform, OS and all important SPS 2010 services. Preventative manteinance will help yoyu identify potential error before an error causes problems with the operation of your SharePoint environment; preventative manteinance combined with regular backups and disaster recovery planning will help you to minimize problems if they occur

The following sections describe specific monitoring tasks which then map to the checklist as described below.

Section Topic Checklist
Diagnostic logging Running SharePoint Server

Check Event Logs

Check SharePoint Farm Backups

Usage data and health data colection View metrics

Check SharePoint Database Health

SharePoint Health Analyzer Repair problems

Check SharePoint Health Analyzer 

Web analyzer View metrics

Check SharePoint Health Analyzer

 

 

 

 

 

 

 

 

 

Diagnostic Logging

The Unified Logging Services (ULS) provides a single, centralized location for logging error and informational message related to SharePoint Server and SharePoint solutions.

SharePoint 2010 includes improvements that are related to the management of the ULS and that make it easier for Administrator to troubleshoot issues. For more information and best practice about Diagnostic Logging http://go.microsoft.com/fwlink/?linkid=194152

 

Diagnostic Logging configuration

Event Throttling

Event throttling enables administrators to control the types of event that SharePoint Server log based on the level of severity. The administration of throttling is divided into two sections:

  • Destination -  Log entries can be reported into two places. The first is the "Event Log", which is the standard Windows Event Log. The second is the ULS or "Trace Log", a txt based log format that is specific to SharePoint Server and is stored on the file system. The default location is C:\program files\common files\microsoft shared\web server extensions\14\LOG
  • Category - The event throttling dial can be applied to specific categories which map directly to SharePoint Server functionality. This enables the administrator to increase the logging detail for SharePoint components individually, thereby managing the size of the log and the amount of information to review.

 The default settings for all categories are as follows:

  • Event Log: Information
  • Trace Log: Medium Level

During normal operation, these settings are an appropriate balance of detail and performance. During substantial reconfiguration of SharePoint Server, during the installation of custom solutions or when SharePoint Server is experiencing issues, the throttling  dial should be turned down. This ensures as much information is available as possible for troubleshooting.

Finally, after completing any troubleshooting, logging can be returned to the default by selecting the "Reset to default"  option in the throttling drop-downs.

 

Correlation IDs

Correlation IDs are GUIDs that are assigned to events which occur during the lifecycle of a resource request. This value is surfaced within error messages, the ULS logs, and tools like Developer Dashboard. This value helps an administrator locate and isolate a specific request across the ULS log, Usage Logging database, and SQL Server Profiler data sets for debugging purposes.

For example, administrators can take the Correlation ID that appears on an error page in their browsers and then rapidly locate any related entries in the ULS log through a simple search.

 

Event Log Flood Protection

Event Log Flood Protection prevents the "Event Log" from being overwhelmed with many repetitive events. When Event Log Flood Protection is enabled, it will start trimming events after the same event is logged five times within two minutes. At this point it suppress additional entries. After an additional two minutes, it throws a summary event that describes the number of times that the event would have been repeated.

 

ULS or Trace Loggging

Trace Logs can quickly consume disk space, especially when configured to use more verbose output settings. To manage this growth, administrators can implement two types of restrictions:

  1. Administrators can determine the number of days that log files should be kept. By default this is set to 14 days
  2. Administrators can also place a limitation on the overall disk space that log files can consume. This is disabled by default  but provides for an additional layer of protection aimed at preventing excessive disk space consumption

To analyze ULS Log you can download ULS log viewer, a powrfull and free tool from codeplex http://ulsviewer.codeplex.com/