SQL Server 2008 introduced what we call the Management Data Warehouse. The Management Data Warehouse is a relational database that contains the data that is collected from a server using the new SQL Server 2008 data collection mechanism. The Warehouse consists of these components:
· An extensible data collector : ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/b2c2bd5e-62c5-4129-a252-ef9439a05487.htm
· A database schema which is indirectly extensible by additions in the data collection: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/9874a8b2-7ccd-494a-944c-ad33b30b5499.htm
· Stored procedures which allow the DBA to create their own data collection set and own the resultant data collection items: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/9dd2824f-ea55-439b-8cd5-3a81fedb1432.htm
· Three Data Collections Sets which are delivered with SQL Server 2008 and which can be enabled at any time: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/0e49bcc7-3bab-4dd5-b5f5-62efa13864f3.htm
· Standard reports delivered with SQL Server 2008 Management Studio display data collected by the three predefined Data Collection Sets. For DBA created Data Collections, reports need to be generated by the DBA or the data can be queried with normal T-SQL queries.
However, this DBA/management toolset can be used by ISVs as well to collect and monitor performance data of SQL Server through their applications. The Management Data Warehouse as delivered with SQL Server can be used as a framework by an application as a basis for database monitoring functionality. Due to the high flexibility of the Management Warehouse this actually a very easy task! The usage of this new component as a database monitoring framework can make support way easier; it can make it simple to find answers to questions like: ‘Was the performance issue users complained about a few hours ago rooted on the database side, the storage backend, or somewhere in the application?’
In the following sections we’ll describe the principle steps to enable SQL Server Management Data Warehouse and Data Collections:
After the customer deployed the application, the customer would need to configure the Management Data Warehouse with SQL Server Management Studio (Object Explorer à Management à Data Collection). Thereby the customer needs to define which database the database schema of the Management Data Warehouse needs to be deployed in. The customer now can be instructed to create a database of a certain name and a certain database file locations or to use a database the application already created during its installation. The application can take reference on this database and can query the database after the user contexts the application uses are assigned into the ‘dbo’ role of the Managment Warehouse database. SQL Server now will create now the following entities in the Management Warehouse database:
· A user named ‘mdw_check_operator_admin’.
· A schema named ‘core’ with a set of tables (see ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/9874a8b2-7ccd-494a-944c-ad33b30b5499.htm
· A schema ‘snapshot’ which contains the tables needed for the System Data Collector Set which are delivered with SQL Server 2008 already. A closer description can be found here: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/0e49bcc7-3bab-4dd5-b5f5-62efa13864f3.htm
Other functionality needed for making the whole warehouse working already is deployed in SQL Server’s msdb database.
In order to start the collection of the 3 collectors which are deployed with SQL Server the customer needs to have SQLAgent started and needs to enable the 3 collection sets manually. However these 3 collection sets only cover minimal aspects and often are not sufficient in a support case to detect whether the issue is on the database side or whether the issue is somewhere else. Even worse if the performance problem currently is not present, but happened hours or days ago, there hardly is any way to figure out what happened hours or days ago.
Another possibility to enable the Management Warehouse and to configure it can be done by T-SQL stored procedure as they are delivered in msdb.
The nice thing about the Management Data Warehouse is the fact that the data collector is a framework which allows the definition of own data collection sets and data collection items. The extension of the data collection or definition of an own data collection also will extend the schema of the warehouse automatically. For an application vendor it means collecting what their typical support scenarios are. As a great example on how to extend such a collection set, one could script the 3 collection sets which are delivered into a Query Window. This can be done by marking one of the sets, click the right mouse button and select the options ‘Script Data Collection As’ à ‘Create to’ à ‘Query Window’. Unfortunately all the collections are ‘TSQL Query Collector Type’ and ‘Query Activity Collector Type based, whereas the SQL Server Data Collector also can be enhanced reading Windows Performance Counters and SQL Server Trace information. For different Collector types please see:ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_4deptrbl/html/a7c8a8ed-d989-4d74-a885-188336c37863.htm
In order to demonstrate how to extend the data collector we use one of the cases which keep us busy very often supporting ISV applications running on SQL Server and databases in general. ‘How was the I/O performance during the time the users were complaining of the application being so slow’
Being faced with a support situation where hours or days ago users were complaining about performance of the application, it becomes extremely tricky to figure out what really happened. When everybody points to the database as source of the problems, it is hard to find any evidence at all unless the customer has a comprehensive set of Performance Counters recorded steadily. That however is a rare case all too often. No blame to a customer not having a comprehensive Perfmon Traces since the administration and archiving of the trace can be rather cumbersome. Therefore we want to show in our first case of extending the SQL Server 2008 Data Collector a case where we collect Performance Monitor data. In opposite to the usage of the Perfmon on the OS side, the Data Collector and the Management Data Warehouse take responsibility to administrate the data. As first we’ll show the script to extend the Data Collector and then walk through it step by step.
So the script generating such a collection would look like:
Declare @collection_set_id_1 int
Declare @collection_set_uid_2 uniqueidentifier
@name=N'Disk Performance and SQL CPU',
@description=N'Collects logical disk performance counters and SQL Process CPU',
Select @collection_set_id_1, @collection_set_uid_2
Declare @collector_type_uid_3 uniqueidentifier
Select @collector_type_uid_3 = collector_type_uid From [dbo].[syscollector_collector_types] Where name = N'Performance Counters Collector Type';
Declare @collection_item_id_4 int
@name=N'Logical Disk Collection and SQL Server CPU',
Counters="Avg. Disk Bytes/Read"
Counters="Avg. Disk Bytes/Write"
Counters="Avg. Disk sec/Read"
Counters="Avg. Disk sec/Write"
Counters="Disk Read Bytes/sec"
Counters="Disk Write Bytes/sec"
Counters="% Privileged Time"
Counters="% Processor Time"
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
DECLARE @ErrorNumber INT;
DECLARE @ErrorLine INT;
DECLARE @ErrorProcedure NVARCHAR(200);
SELECT @ErrorLine = ERROR_LINE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE(),
@ErrorNumber = ERROR_NUMBER(),
@ErrorMessage = ERROR_MESSAGE(),
@ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage);
Now let’s go through it step by step. As mentioned vital functionality already is delivered in the SQL Server database msdb. Therefore we execute the script above in msdb.
First step is to create an own Data Collection Set. The name we give will show up in SQL Server Management Studio and at the end will be the name we use to query for results.
Also very important information to define is the data retention period with the parameter named @days_until_expiration. Since we don’t want to end up blowing the volume of the Management Data Warehouse beyond the size of the application to monitor, this parameter needs to be set. Data beyond that expiration age is getting purged.
Other important information to give is the schedule which should be used to execute the collection. The creation of the Management Data Warehouse created pre-defined schedules with the following names:
The names also can be checked in msdb.dbo.sysschedules. Sure own schedules can also be defined and can be used instead.
Another parameter is worth mentioning. Having read the Books Online Documentation about the Data Collector we pointed out earlier, it got mentioned that one can cache the data for a while before uploading into the Management Data Warehouse. Whether one wants to do so is determined with the parameter called @collection_mode. In our case the value of 1 means a direct upload after the collection without any caching.
The data about the collection set now is entered into a table in msdb. The important thing now is to get the ID and UID of the collection set which are delivered as output of the procedure to create the collection set.
The set of collections already existing also can be retrieved out of msdb.dbo.syscollector_collection_sets_internal
The second step is to get the UID of one of the 4 different Data Collection types we mentioned above already. This is done with this query in the script:
The other 3 Data Collection types would need to be defined by these names:
· Generic T-SQL Query Collector Type
· Generic SQL Trace Collector Type
· Query Activity Collector Type
Now the third and probably least documented step so far. The specific data collection item is going to be defined. The first parameter is the name of the collection. The second parameter which simply has the name ‘parameter’ defines what really has to happen. As one can see it is a XML structure which has kind of a one line header and then the same structure for any of the definition of a counter again. Looks pretty simple as one can see above where we define 6 different counters of Logical Disks over all disk partitions visible to the server. We also added collecting the Privileged CPU and the overall CPU SQL Server is consuming. Everything is set now. One just needs to enable the new Data Collection Set with the stored procedure sp_syscollector_start_collection_set (see: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/d8357180-f51e-4681-99f9-0596fe2d2b53.htm ) and have SQLAgent running and the data collection starts.
In order to programmatically read the data one needs to look a bit into the different tables of the Management Data Warehouse schema where the data is getting stored in:
snapshots.performance_counter_values: will store the raw and formatted data identified with a performance_counter_instance_id, the snapshot_id and the date of the snapshot. So we already have two important components of data with a query like this:
select spcv.formatted_value as 'Formatted Value',
spcv.collection_time as 'Collection Time'
from snapshots.performance_counter_values spcv
order by spcv.collection_time desc
In order to get the Path and the name of the counters based on the id, we need to join the table snapshots.performance_counter_instances with the table containing the values. A query could look like this:
select spci.path as 'Counter Path', spci.object_name as 'Object Name',
spci.counter_name as 'counter Name', spci.instance_name,
spcv.formatted_value as 'Formatted Value',
from snapshots.performance_counter_values spcv,
where spcv.performance_counter_instance_id = spci.performance_counter_id
So far so good. However the data we are selecting so far might be from different collection sets even from different database instances (if one allows centralizing). Therefore we need to restrict the data now to the data collected by our custom build data collection set. In order to do so we need to look at several different tables to make this connection. At the end the query looks like:
spcv.collection_time as 'Collection Time',
csii.instance_name as 'SQL Server Instance'
where spcv.performance_counter_instance_id = spci.performance_counter_id and
csii.source_id = csi.source_id and csi.snapshot_id=spcv.snapshot_id and
scsi.name = 'Disk Performance and SQL CPU'
order by spcv.collection_time desc