To ensure the availability and reliability of your SharePoint Server 2010 environment you must actively monitor the physical platform, OS and all important SPS 2010 services. Preventative manteinance will help yoyu identify potential error before an error causes problems with the operation of your SharePoint environment; preventative manteinance combined with regular backups and disaster recovery planning will help you to minimize problems if they occur
The following sections describe specific monitoring tasks which then map to the checklist as described below.
Check Event Logs
Check SharePoint Farm Backups
Check SharePoint Database Health
Check SharePoint Health Analyzer
Diagnostic Logging
The Unified Logging Services (ULS) provides a single, centralized location for logging error and informational message related to SharePoint Server and SharePoint solutions.
SharePoint 2010 includes improvements that are related to the management of the ULS and that make it easier for Administrator to troubleshoot issues. For more information and best practice about Diagnostic Logging http://go.microsoft.com/fwlink/?linkid=194152
Diagnostic Logging configuration
Event Throttling
Event throttling enables administrators to control the types of event that SharePoint Server log based on the level of severity. The administration of throttling is divided into two sections:
The default settings for all categories are as follows:
During normal operation, these settings are an appropriate balance of detail and performance. During substantial reconfiguration of SharePoint Server, during the installation of custom solutions or when SharePoint Server is experiencing issues, the throttling dial should be turned down. This ensures as much information is available as possible for troubleshooting.
Finally, after completing any troubleshooting, logging can be returned to the default by selecting the "Reset to default" option in the throttling drop-downs.
Correlation IDs
Correlation IDs are GUIDs that are assigned to events which occur during the lifecycle of a resource request. This value is surfaced within error messages, the ULS logs, and tools like Developer Dashboard. This value helps an administrator locate and isolate a specific request across the ULS log, Usage Logging database, and SQL Server Profiler data sets for debugging purposes.
For example, administrators can take the Correlation ID that appears on an error page in their browsers and then rapidly locate any related entries in the ULS log through a simple search.
Event Log Flood Protection
Event Log Flood Protection prevents the "Event Log" from being overwhelmed with many repetitive events. When Event Log Flood Protection is enabled, it will start trimming events after the same event is logged five times within two minutes. At this point it suppress additional entries. After an additional two minutes, it throws a summary event that describes the number of times that the event would have been repeated.
ULS or Trace Loggging
Trace Logs can quickly consume disk space, especially when configured to use more verbose output settings. To manage this growth, administrators can implement two types of restrictions:
To analyze ULS Log you can download ULS log viewer, a powrfull and free tool from codeplex http://ulsviewer.codeplex.com/