Troubleshooting Microsoft SQL Server

Blog is a collection of misc troubleshooting tips collected while supporting Microsoft SQL Server 2000 2005 and 2008 customers. Forcus area covered include SSIS, Performance, and Replication

  • How to skip a transaction in SQL 2005/2008 Transactional Replication

    Once in a while you may find the need to skip an indiviual transaction in SQL 2005/2008 Transactional Replication. For example, when Distribution Agent fails because of an error and the database integrity won't be impacted by skipping that transactions.  For instructing the Distribution Agent to skip a transaction, SQL provides 2 very helpful stored procedures.  sp_helpsubscriptionerrors can be used to display the list of invalid transactions.  The key output from this SP is the xact_seqno which is supplied to the second stored procedure sp_setsubscriptionxactseqno as the transaction LSN or sequence number to be skipped. 

    sp_helpsubscriptionerrors

    http://msdn.microsoft.com/en-us/library/ms173427.aspx

     

    sp_setsubscriptionxactseqno

    http://msdn.microsoft.com/en-us/library/ms188764.aspx

    Note the BOL topic incorrectly says "Is the LSN of the next transaction at the Distributor to be applied at the Subscriber" when it is actual the LSN of the transaction the Distributor will SKIPP.  We're working to get BOL updated.

    EXAMPLE

    sp_helpsubscriptionerrors 'SQLw2k8','dbtranpub','dbtranpub_pub','SQLw2k8Subscriber','dbtransub':

     

    id          time                    error_type_id source_type_id source_name   error_code   error_text                                   xact_seqno                         command_id  session_id

    ----------- ----------------------- ------------- -------------- ------------- ------------ -------------------------------------------- ---------------------------------- ----------- -----------

    16          2009-09-08 05:14:04.673 0             0              MSSQL_ENG     8152         String or binary data would be truncated.    0x0000002B00000196000300000000     1           0

    16          2009-09-08 05:14:04.660 0             0              MSSQL_ENG     8152         String or binary data would be truncated.    0x0000002B00000196000300000000     1           0

    16          2009-09-08 05:14:04.660 0             1              MSSQL_ENG                  if @@trancount > 0 rollback tran             0x0000002B00000196000300000000     1           0

     

    Command to skip the transaction:

    sp_setsubscriptionxactseqno 'SQLw2k8','dbtranpub','dbtranpub_pub',0x0000002B000001960003

     

    Many thanks to Amit Banerjee, Microsoft SQL Technical Lead, for the example and testing of these SPs.

    Chris Skorlinski, Microsoft SQL Server Escalation Services

     

  • How to run TableDiff utility for ALL replicated (published) tables in SQL 2005 or SQL 2008

    We recently need to find all the data which was different in 300 SQL Replicated tables on a pair of database.  To find the differences, we were planning on using the TableDiff utility included with SQL 2005. 

    TableDiff Utility http://msdn.microsoft.com/en-us/library/ms162843.aspx

    By default, the TableDiff creates a new *.SQL file for each table.  There is not a way to concatenate output from multiple commands.  Already you can see this wasn't going to be easy.  We came up with a simple solution.

    --Build TableDiff commands for each Article then execute these commands
    select '"C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe" -sourceserver [SKOR380] -sourcedatabase [TRANPUB] -sourcetable [' + name +'] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [SKOR380] -destinationdatabase [TRANSUB] -destinationtable [' + name + '] -destinationschema [dbo] -destinationlocked [TABLOCK] -f c:\temp\TableDiff' + cast(artid as varchar(100))+ '.sql' from sysarticles

    Executing the SELECT statement we build a TEXT ouput of the unqiue TableDiff commands to execute in a CMD/DOS batch.  DOS, wow, now I'm dating myself!

    C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe -sourceserver [SKOR380] -sourcedatabase [TRANPUB] -sourcetable [Order Details] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [SKOR380] -destinationdatabase [TRANSUB] -destinationtable [Order Details] -destinationschema [dbo] -destinationlocked [TABLOCK] -f c:\temp\1.sql
    C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe -sourceserver [SKOR380] -sourcedatabase [TRANPUB] -sourcetable [Orders] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [SKOR380] -destinationdatabase [TRANSUB] -destinationtable [Orders] -destinationschema [dbo] -destinationlocked [TABLOCK] -f c:\temp\2.sql
    C:\Program Files\Microsoft SQL Server\90\COM\tablediff.exe -sourceserver [SKOR380] -sourcedatabase [TRANPUB] -sourcetable [Shippers] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [SKOR380] -destinationdatabase [TRANSUB] -destinationtable [Shippers] -destinationschema [dbo] -destinationlocked [TABLOCK] -f c:\temp\3.sql

    Executing these commands along with the COPY command builds on SQL file with all the necessary changes to bring the 2 servers in-sync.

     --Concatenate all *.sql into one SQL file
    Copy TableDiff*.sql TableDiffALL.sql

    -- Database: [TRANSUB]
    -- Table: [dbo].[Order Details]
    UPDATE [dbo].[Order Details] SET [Quantity]=5 WHERE [OrderID] = 10250 AND [ProductID] = 51
    -- Host: SKOR380
    -- Database: [TRANSUB]
    -- Table: [dbo].[Orders]
    UPDATE [dbo].[Orders] SET [ShipName]='Pub1' WHERE [OrderID] = 10250

     

  • How Replication setting Immediate_sync may cause Transactional Replication Distribution database growth

    How Replication setting Immediate_sync may cause Transactional Replication Distribution database growth and long running cleanup.

    Distribution Cleanup for highly active servers with 60-100+ millions rows cached, database growth and cleanup will be an ongoing issue.  If we can reduce the number of transactions/commands cached in the distribution database we can improve overall replication performance.  

     

    For these environments we recommend disabling “immediate_sync” if not needed for business reasons.  When enabled, this setting maintains the replicated transactions in the Distribution database, just-in-case, a new subscriber is added or an existing subscriber needs to reinitialize.  If new subscribers are not being added or existing not being reinitialized, this feature can be disabled. When disabled, the transactions are deleted as soon as they are replicated to the subscriber thereby reducing replication metadata overhead. 

     

    This job is responsible for deleting data from MSrepl_transactions and MSrepl_commands tables in the Distribution Database.   Two parameters are passed to the cleanup job. The first one is minimum number of hours to keep transactions and second is to delete if transaction are older.

     

     

        EXEC dbo.sp_MSdistribution_cleanup  @min_distretention = 4, @max_distretention = 72

     

    When old data gets purged is determined by "immediate sync" setting of the Publication.

     

    --distribution database

    select immediate_sync,* from distribution.dbo.MSpublications

     

    --published database

    select immediate_sync, * from syspublications

     

    In the above example:

    If "immediate sync" = TRUE, then keep transaction for at least 4 hours EVEN IF they have been replicated to the subscriber but delete any transaction older the 72 hours.  Metadata that is younger than 72 hours, but older than 4 hours will be kept. This is conservative.

     

    If "immediate sync" = FALSE, then keep transaction for at least 4 hours but if the transaction has already been replicated (distributed) to the subscriber go ahead and delete EVEN IF not older then 72 hours. Metadata that is younger than 72 hours, but older than 4 hours may be deleted quite aggressively.

     

    If the amount of metadata within the retention period is a problem, you could potentially change the immediate_sync flag with sp_changepublication, without reinit required, but there are some differences that you should be aware of. Read BOL for sp_addpublication about immediate_sync.  There are potentially negative side effects for no-sync subptions if they have been marked expired or inactive, and also adding new subscribers and adding new articles will require more manual intervention.

     

    sp_changepublication @publication='repltest', @property='immediate_sync',@value='FALSE'

     

    Chris Skorlinski

    Microsoft SQL Server Escalation Services

     

  • Why is query performance of Replication Subscriber slower than Publisher?

     

    Time to compare indexes on Publisher vs. Subscriber.

     

    We encounter an unusual problem in SQL 2005/2008 where the indexes didn’t get created on the Subscriber when the Snapshot was pushed by the Merge Agent.  Without critical indexes, query optimizer was forced to perform Table Scan.  Solution was easier, add the missing indexes.  The root cause was a bit tricky.

     

    Turns out the customer made a copied a table on the subscriber to a new name “Customers_BACKUP”, but kept the same name for the indexes “PK_Customers”.  When the Snapshot was pushed, the new “Customers” table was created, however the “ADD  CONSTRAINT” step failed.

     

    On Publisher

    -----------------------

    CREATE TABLE [dbo].[Customers](

     [CustomerID] [nchar](5) NOT NULL,

     [CompanyName] [nvarchar](40) NOT NULL,

     [ContactName] [nvarchar](30) NULL,

     CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED

    (

     [CustomerID] ASC

    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

    ) ON [PRIMARY]

     

    Backup table on Subscriber

    -----------------------------------

    CREATE TABLE [dbo].[Customers_BACKUP](

     [CustomerID] [nchar](5) NOT NULL,

     [CompanyName] [nvarchar](40) NOT NULL,

     [ContactName] [nvarchar](30) NULL,

     CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED

    (

     [CustomerID] ASC

    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

    ) ON [PRIMARY]

     

    Step which failed

    ----------------------------

    ALTER TABLE [dbo].[Customers] ADD  CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED

    (

     [CustomerID] ASC

    )

    GO

     

    The error appears in the SQL Profile Trace, however, for SQL 2005 and 2008 the Merge agent skipped the error, continued to apply the snapshot, then started replicating changes without logging the index was skipped.  We're still looking into the "why" part.

     

    Server: Msg 2714, Level 16, State 4, Line 1

    There is already an object named 'PK_Customers' in the database.

     

    So, not getting the performance your expecting when running queries on your subscriber?  Time to check for missing indexes.

     

    Thanks,

    Chris Skorlinski

    Microsoft SQL Server Escalation Services

  • Transactional Replication Conversations

    Transactional Replication Conversations

    When troubleshooting Transactional Replication performance break down the flow of data into 4 simultaneous conversation.  Examine performance of each conversation.  This will help verify where the bottleneck investigation should begin.

     

     

     

    1)      LogReader Reader thread is reading the Transaction Log via stored procedure sp_replcmds, a wrapper for xp_replcmds.  It scans the transaction log for transactions marked for replication, skipping not replicated transactions.

     

     

    2)      LogReader Writer thread writes queued transaction from the Reader thread into the Distribution database using sp_MSadd_replcmds.

     

    3)      Distribution Reader thread executes sp_MSget_repl_commands query to retrieve pending commands from the Distribution database and storing in an internal queue.

     

    4)      Distribution Writer thread writing queue commands to the Subscriber via parameterized stored procedures prefixed with sp_MSupd..., sp_MSins..., sp_MSdel... to apply individual row changes to each article at the subscriber.

     

    5)      Log Reader and Distributor are also executing a “history” thread writing summary data to MSlogreader_history and MSdistribution_history system tables of the distribution database.

    Flow

     

    Tools to assist with troubleshooting performance

     

    SQL Profiler: There are multiple methods to examine in detail the Transaction Replication conversations.   As with all application connecting to SQL Server, the individual commands being executed by Replication components can be captured using SQL Profiler tool.  The Profile trace can be sorted or queried to find long running statements. 

     

    Examples of which ApplicationNames you will find:

    ·         REPLICATION MONITOR -- GUI from Management Studio

    ·         REPL-LOGREADER-#-DatabaseName-# -- Writer thread from Logreader Agent to log changes into distribution tables.

    ·         ServerName\Instance-DatabaseName-PubName-Subscriber -- Reader thread from Distribution Agent to find out which commands/transaction to replication to the Subscriber

    ·         REPLICATION LOGREAD HISTORY -- writer thread from Log Reader agent to log history

    ·         REPLICATION DISTRIBUTION HISTORY -- writer thread from Distribution agent to log history

     

    Example of Replication Stored Procedure calls:

    • Distribution Activity:
      • SP_MSGET_REPL_COMMANDS
        SP_MSHELP_DISTRIBUTION_AGENTID
        SP_MSGET_SUBSCRIPTION_GUID
    • LogReader Activity:
      • SP_MSADD_REPLCMDS
        SP_MSADD_LOGREADER_HISTORY
        SP_REPLCMDS
        SP_MSPUB_ADJUST_IDENTITY
        EXEC SP_MSGET_LAST_TRANSACTION @PUBLISHER_ID = {##}, @PUBLISHER_DB = {STRN}, @FOR_TRUNCATE = {BS}
        SP_REPLDONE
        EXEC SP_REPLCOUNTERS {STRN}
    • SubScriber Activity:
      • SP_MSins%
      • SP_MSupd%
      • SP_MSdel%

     

    Activity Monitor: SQL Management Studio Activity Monitor, Performance Dashboard for SQL 2005, Performance Data Warehouse for SQL 2008, or system stored procedures help find blocking, disk bottlenecks, and statements causing high IO and high CPU.  To help identity Replication components look for the LogReader or Distribution Agent Job Names as the Program_Name connecting to SQL Server.  To see a list of the Replication Jobs, run the query below

     

    SELECT name, description, enabled from MSDB..sysjobs

    WHERE category_id>10 and category_id<20

     

    Tracer Tokens: To gain a quick high-level overview of Replication performance use the Tracer Token feature in Replication Monitor. The features provides “Publisher to Distributor” and “Distributor to Subscriber” latency values.  To use this feature, launch Replication Monitor. In the My Publishers list, drill in and click on the publication you wish to test. In the right pane, choose the Tracer Token tab and click the “Insert Tracer Token” button. There will be a list of subscriptions below. Monitor the “Publisher to Distributor” and “Distributor to Subscriber” values.

     

    This method is only helpful in where times are somewhat real-time. If the topology is behind 3 days, then you will never see the tracer token make it to the subscriber in time, so the only thing you can see is if the Log Reader is keeping up or getting behind. That is still valuable information, but more trivial than if this is a problem where the subscriber is only 5 minutes behind and you can see the full flight of the token from start to finish.

     

    If you’re seeing long time for “Publisher to Distributor” to populate, focus on the troubleshooting the LogReader performance.  If LogReader performance is fast while the “Distributor to Subscriber” values are high, transactions are moving quickly from the Published database transaction log to the Distribution database but slow from the Distribution database to the Subscribers.  The Distribution Agent should be the focus of the bottleneck investigation.

     

    The following links provide sample code to measure latency over time.

     

     

    Performance Monitor: You can also use Windows Performance Monitor's to track replication LogReader 'LogReader:Delivery Latency' counter or the replication Distribution 'Dist:Delivery Latency' counter. However, Windows Performance Monitor's Replication counter values are only updated after the Log Reader or Distribution Agent completes a processing phase and records performance stats.  The counter values will report 0 commands/sec and transactions/sec while a large transaction consisting of millions of rows is being processed.

    Log Reader Reader-Thread Latency

     

    General Troubleshooting:

    Adding the “-OUTPUT” parameter to the Log Reader breaks down the time for both Reader (Fetch time), and the Writer (Write time)conversations. In the sample output below the Reader-Thread took 55 minutes while the Writer-Thread needed only 3 minutes to processes the transaction indicating Log Reader Reader-Thread is the bottleneck.

     

    Replication Log Reader Agent

    http://msdn.microsoft.com/en-us/library/ms146878.aspx

     

    ******************** STATISTICS SINCE AGENT STARTED ***************************
    Execution time (ms): 3401930
    Work time (ms): 3401470
    Distribute Repl Cmds Time(ms): 3354520
    Fetch time(ms): 3343920   <-- read time 55 minutes
    Repldone time(ms): 5520
    Write time(ms): 140070    <-- write time 3 minutes
       Num Trans: 8  Num Trans/Sec: 0.002352
       Num Cmds: 477219  Num Cmds/Sec: 140.297871
    *******************************************************************************

     

    Starting with SQL Server builds below the Log Reader Agent stats are displayed ever 5 minutes and written to the MSlogreader_history table in the Distribution database.

     

    Cumulative Update 12 for SQL Server 2005 Service Pack 2

    Cumulative Update 5 for SQL Server 2008

    Cumulative Update 2 for SQL Server 2008 Service Pack 1

     

     

    Possible Cause: large batch of replicated transaction

    How large is the transaction Log for this published database?  Has it grown to an unexpected size causing longer read times by the Log Reader thread. 

     

    DBCC SQLPERF(LOGSPACE)

     

    The reader thread of the Log Reader executes sp_replcmds to pull commands from the transaction log.  The stored procedures below can be executed in SQL Server Management Studio to track time required to pull next pending replicated transaction.  How long do these command take to execute?  How many commands were returned?  Scanning a large transaction log for a single transaction containing a million+ row modifications  may take 20+ minutes to complete.  The execution time for this query approximates the startup time for the Log Reader to replicate this same transaction. 

     

    Display 1st pending transaction metadata such as publication_id, article id, xactid, and command_type.  Is this command part of normal database operation or an unexpected large batch of commands?

     

    sp_replcmds [ @maxtrans = ] maxtrans

    http://msdn.microsoft.com/en-us/library/ms186983.aspx

     

    example:

    --Return all commands from the next pending transaction

    sp_replcmds @maxtrans = 1

    GO

     

    --Return all commands for the next 500 transactions

    sp_replcmds @maxtrans = 500

    GO

    sp_replflush

    GO

     

    Display 1st transactions metadata along with text of commands yet to replicate.  Commands type = 1073741871 are pending Tracer Tokens.

     

    sp_replshowcmds [ @maxtrans = ] maxtrans

    http://msdn.microsoft.com/en-us/library/ms175114.aspx

     

    example:

    --Return all commands from the next pending transaction with CMD text

    sp_replshowcmds @maxtrans = 1

    GO

     

    --Return all commands for the next 500 transactions with CMD text

    sp_replshowcmds @maxtrans = 500

    GO

    sp_replflush

    GO

     

    Another technique to see summary of a batch of pending transactions/commands is to query sys.dm_repl_traninfo view. This displays summary data on replicated transactions in the log last read by sp_replcmds or sp_replshowcmds, including the “cmds_in_tran.” A high value may indicate a large transaction is being replicated. 

     

    sys.dm_repl_traninfo

    http://msdn.microsoft.com/en-us/library/ms180036.aspx

     

    sp_replcmds @maxtrans = 500  -- retrieve 500 pending transactions

    GO

    SELECT dbid, begin_lsn, commit_lsn, rows, cmds_in_tran

    FROM sys.dm_repl_traninfo

    GO

     

    dbid   begin_lsn                        commit_lsn                       rows        cmds_in_tran

    ------ -------------------------------- -------------------------------- ----------- ------------

    10     0000081A:0000013A:0001           0000081A:0000013A:0004           1           1

    10     0000081A:00000140:0001           0000081A:00000140:0004           1           1

    10     0000081A:00021385:0001           0000081E:00004EA2:0009           6874        6874

     

     

    The following commands can be used to determine the number of non-replicated vs. replicated transactions.  A transaction log with a high percentage of non-replicated transaction will cause latency as the Log Reader scans over transaction to be ignored.  These can be database maintenance transactions such as an online reinidex or data modifications to table not being replicated.  A 25gig transaction log may take up to 15 minutes to scan depending on the IO subsystem.

     

    Use <published database>

    GO

    -- Total records in the log

    SELECT count(*) FROM ::fn_dblog(NULL, NULL)

    GO

    -- Records marked for REPLICATION

    SELECT count(*) FROM ::fn_dblog(NULL, NULL) WHERE Description='REPLICATE'

    GO

     

    If you catch the large transaction in the act of being processed by sp_replcmds, you should also see the long run time of that event in SQL Profiler trace data. Using a captured Profile trace run command below to query the results:

     

    SELECT duration, starttime, endtime, textdata

    FROM ::fn_trace_gettable('C:\PUBLISHERDB_sp_trace.trc', DEFAULT)

    WHERE TEXTDATA LIKE '%SP_REPLCMDS%' AND EVENTCLASS=10

     

            Duration    StartTime                  EndTime                     TextData
            388         2008-10-21 15:00:06.010    2008-10-21 15:00:06.010     exec sp_replcmds 500,0,-1,0x,5000,0,500000
            393         2008-10-21 15:00:11.010    2008-10-21 15:00:11.010     exec sp_replcmds 500,0,-1,0x,5000,0,500000
            397         2008-10-21 15:00:16.010    2008-10-21 15:00:16.010     exec sp_replcmds 500,0,-1,0x,5000,0,500000
            66302488    2008-10-21 15:00:21.010    2008-10-21 15:01:27.290     exec sp_replcmds 500,0,-1,0x,5000,0,500000
            413         2008-10-21 15:01:27.353    2008-10-21 15:01:27.353     exec sp_replcmds 500,0,-1,0x,66344,0,500000

     

    Shown above, processing the large transaction causes sp_replcmds took 66 seconds to execute.

     

     

    Resolution:

    If Reader Latency is caused by large number of pending commands, waiting for the Log Reader to catch up may be the best short-term solution.  Long-term options include replicating batches during non-peak time or as stored procedure executions.

     

    Publishing Stored Procedure Execution in Transactional Replication

    http://msdn.microsoft.com/en-us/library/ms152754(SQL.90).aspx

     

    Another option to consider is configuring the Log Reader to break up the one large transaction into multiple batches.  By committing large transaction in smaller batches we may be able to reduce the Log Reader overhead.  However, performance improvement would be most noticeable when Distributing data to subscribers in multiple parallel STREAMS.

     

    Replication Log Reader Agent

    http://msdn.microsoft.com/en-us/library/ms146878.aspx

     

    Depending on estimated time to read pending transactions and transfer them to the subscribers, it may be faster to mark all transactions as “replicated”, then reinitialize the subscribers with a new snapshot or via backup/restore.  This is step should only be taken if the time to generated a new Replication Shapshot and deliver to subscriber is faster than waiting for individual pending commands to be replicated.

     

    sp_repldone (Transact-SQL)

    http://msdn.microsoft.com/en-us/library/ms173775.aspx

     

    EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1

     

     

    Possible Cause: large number of non-replication transactions:

    This can be more challenging to identify. It helps to have a good knowledge of which articles are being replicated and what batch changes have been recently performed.  The BINARY column in the Profile Trace represents the number of rows processed by sp_replcmds.  Multiple executions with few rows usually indicates LogReader spent time skipping over non-replicated transactions such as large table performing an online reindex.

     

    The output below from the LogReader log clearly shows many rows are being read while only a few are being replicated.

     

     

    2008-11-11 23:41:35.527 The Log Reader Agent is scanning the transaction log for commands to be replicated. Approximately 500000 log records have been scanned in pass # 3, 142 of which were marked for replication, elapsed time 22535010 (ms).

     

    Resolution:

    This is unavoidable when large number of transaction for non-replicating table.  If problem is caused by index maintenance log records,  consider performing offline reindexing.  Another option may be stopping and starting the online reindexing to create smaller batch of index update transactions.

     

     

    Possible Cause: High number of VLFs:

    A large number of Virtual Log Files (VLFs) can contributed to long running read times for the LogReader.  To determine the number of VLFs, execute the following command and note the number of segments (rows) returned.  Segment counts in 100K+ may be contributing to Log Reader Reader-Thread performance problems.

     

    USE <published DB>

    GO

    DBCC LOGINFO

     

    Resolution:

    Shrink the transaction log to move extra VLFs.  Set the database “growth” to value which allows for growing without creating high number of VLFs

     

    907511 How to use the DBCC SHRINKFILE statement to shrink the transaction log file in SQL Server 2005

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;907511

     

     

    Possible Cause: Slow Read IO:

    Slow disk subsystem may be contribution to the long read times.  Good overall measure of disk read performance is Windows Performance Monitor counter Avg Disk Sec/Read.

     

    Resolution:

    Read times in >15ms (.0015) may indicate an IO read bottleneck and should be investigated by the storage subsystem vendor.

     

    Possible Cause: Slow Network IO:

    For remote Distributors using a PULL Log Reader, the logread.exe is executing on the Distributor but Pulling rows from the Publishers transaction log.  Unexpected network IO problem will contribute to the Reader-Thread latency. 

     

    Resolution:

    Using Windows Performance monitor both the Publication and Distribution servers, check the Network Interface 'Output Queue Length' counter for the active NIC - the counter should consistently be under 2.

     

    For more information please see the following MSDN topic titled 'Network Interface Object.'

    http://msdn.microsoft.com/en-us/library/ms803962.aspx

     

    Log Reader Writer-Thread Latency

     

    Possible Cause: Blocking: High values in the Log Reader Agent -OUTPUT log for “Write time(ms):” indicate bottleneck writing commands to the Distribution database.  Blocking may be caused by another SQL Replication Agent such as a the Distribution Cleanup Agent.  Investigate blocking by the Log Reader Writer-Thread using SQL Server built in tools such as SQL Server Management Studio Activity Monitor, Performance Dashboard, Performance Database Warehouse, or stored procedures like sp_who.

     

    For SQL 2005 download and install the Performance Dashboard.  This provide a high-level look at the Distributor performance.  Blocking, long running queries, high IO wait times are displayed in a graphical format.  For SQL 2008, these reports are included in the Performance Data Warehouse.

     

    SQL Server 2005 Performance Dashboard Reports

    http://www.microsoft.com/downloads/details.aspx?familyid=1d3a4a0d-7e0c-4730-8204-e419218c1efc&displaylang=en

     

     

    Resolution:

    If the “head blocker” is the Distribution cleanup, consider stopping this Agent, allowing data to get replicated, then restart cleanup during off-peak hours.  Blocking may indicate IO taking longer than expected to complete.  See “High IO” for additional troubleshooting steps.

     

     

    Possible Cause: High IO: Check the Profile trace collected on the Distributor for the Duration and CPU values for the executions of sp_MSadd_replcmds. 

    A high IO may indicate poor query plan.  Using a captured Profile trace read the CPU and Duration for the completed events using the command below. 

     

    SELECT duration, starttime, endtime, textdata

    FROM ::fn_trace_gettable('C:\DISTRIBUTOR_sp_trace.trc', DEFAULT)

    WHERE TEXTDATA LIKE '%SP_MSadd_replcmds%' AND EVENTCLASS=10

     

    Durations of 10x over the CPU times indicate resource contention. Running SQL Server Management Studio reports on the Distributor look for blocking of the Log Reader agent.

     

    With SQL 2005 and 2008 system DMVs can also be used to retrieve the duration and CPU values for the Log Reader Writer-Thread. Low Duration and CPU but high Logical Reads may indicate a poor query plan caused by out of data table statistics.  The command below will retrieve the query plan along with the execution statistics.

     

    --dm_exec_query_stats for Log Reader Writer-Thread sp_MSadd_replcmds

    -- by top total_worker_time

    SELECT TOP 25

      st.text, qp.query_plan,

        (qs.total_logical_reads/qs.execution_count) as avg_logical_reads,

        (qs.total_logical_writes/qs.execution_count) as avg_logical_writes,

        (qs.total_physical_reads/qs.execution_count) as avg_phys_reads,

      qs.*

    FROM sys.dm_exec_query_stats as qs

             CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

             CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp

    WHERE st.text like 'CREATE PROCEDURE sp_MSadd_replcmds%'

      ORDER BY qs.total_worker_time DESC

    Go

     

     

    Possible Cause: Slow WRITE IO:

    Slow disk subsystem may be contribution to the long write times.  Good overall measure of disk read performance is Windows Performance Monitor counter Avg Disk Sec/Write.

     

    Resolution:

    Write times in >20ms (.0020) may indicate an IO write bottleneck and should be investigated by the storage subsystem vendor.

     

    Possible Cause: Network IO

    Similar to situation above, you may want to investigate Network interface queue length should be confirmed. 

     

    Distribution Agent Reader Latency

    Like the Log Reader, the Distribution Agent is 2 simultaneous conversation going.  Delay in moving rows from the Distributor to the Subscriber may occur in either or both of these conversation.  The Reader-Thread conversation is responsible for querying the Distribution database for the list of transactions to apply at the subscriber.  It first queries the subscriber metadata table to find the last successfully replicated transactions.  Next the Reader-Thread executes sp_MSget_repl_commands to begin reading rows from the Distribution database and populates this rows into an in-memory queue.  As the rows get queued, the Distribution Agent Writer-Thread conversation begins writing these rows to the Subscriber.  Like with the Log Reader, adding the Distribution Agent -OUPUT parameter will provide good overview of the Distribution Agent performance.  As noted above, recent builds of SQL Server now automatically records this information in the Distribution Agent history table MSdistribution_history.

     

    In this output the READS completes in 92ms at a rate of 565 cmds/sec while the WRITES are taking 14,000ms( 14 seconds ) and only processing 3.69 cmds/sec.  This example clearly shows the WRITE are slower then reads.

     

     

    ************************ STATISTICS SINCE AGENT STARTED ***********************

    Total Run Time (ms) : 18828  Total Work Time  : 14110

    Total Num Trans     : 52  Num Trans/Sec    : 3.69

    Total Num Cmds      : 52  Num Cmds/Sec     : 3.69

    Writer Thread Stats

      Time Spent on Exec        : 12063

      Time Spent on Commits (ms): 422  Commits/Sec         : 0.14

      Time to Apply Cmds (ms)   : 14110  Cmds/Sec            : 3.69   <-- Write Thread

      Time Cmd Queue Empty (ms) : 671  Empty Q Waits > 10ms: 2

      Total Time Request Blk(ms): 671

    Reader Thread Stats

      Calls to Retrieve Cmds    : 2

      Time to Retrieve Cmds (ms): 92  Cmds/Sec            : 565.22   <-- Reader Thread

      Time Cmd Queue Full (ms)  : 5486  Full Q Waits > 10ms : 3

    *******************************************************************************

     

     

    Possible Cause: Large Batch of Transactions

    Latency problem are often caused when series of transactions are trying to move a large batch of commands to the Subscribers.  The queries below show overall rows counts and index statistics for commands stored in the Distribution database.

     

    --ROW COUNTS

    --Look for high row counts (>1millions) indicate cleanup not running

    --    or large pending transactions.

    SELECT name, rowcnt, STATS_DATE (id, indid) as 'Last Update Stats'

    FROM distribution.dbo.sysindexes

    WHERE name IN('ucMSrepl_transactions', 'ucMSrepl_commands')

     

    Are the row counts expected or do they now contain millions of rows?  High rows counts (>1 million) may indicate a large transaction is being processes or cleanup procedure is not running.

     

    When performance troubleshooting latency a review of pending commands by day by # commands may uncover helpful pattern.  A breakdown of the commands being stored in the Distribution database can be retrieve by running the following quries.

     

    -- Check the Time associated with those Transaction Counts into temp table

    select t.publisher_database_id, t.xact_seqno,

          max(t.entry_time) as EntryTime, count(c.xact_seqno) as CommandCount

    into #results

    FROM MSrepl_commands c with (nolock)

    LEFT JOIN  msrepl_transactions t with (nolock)

          on t.publisher_database_id = c.publisher_database_id

          and t.xact_seqno = c.xact_seqno

    GROUP BY t.publisher_database_id, t.xact_seqno

     

     

    -- Show each hour and number of commands per Day:

    SELECT publisher_database_id

          ,datepart(year, EntryTime) as Year

          ,datepart(month, EntryTime) as Month

          ,datepart(day, EntryTime) as Day

          ,datepart(hh, EntryTime) as Hour

          --,datepart(mi, EntryTime) as Minute

          ,sum(CommandCount) as CommandCountPerTimeUnit

    FROM #results

    GROUP BY publisher_database_id

          ,datepart(year, EntryTime)

          ,datepart(month, EntryTime)

          ,datepart(day, EntryTime)

          ,datepart(hh, EntryTime)

          --,datepart(mi, EntryTime)

    --order by publisher_database_id, sum(CommandCount) Desc

    ORDER BY publisher_database_id, Month, Day, Hour

     

    In the sample output below, a large batch of transactions were being replicated as result of table updates causing slowdown in the Distribution Agent.

     

    publisher_database_id Year        Month       Day         Hour        CommandCountPerTimeUnit

    --------------------- ----------- ----------- ----------- ----------- -----------------------

    2                     2009        5           14          10          132

    2                     2009        5           14          11          656

    2                     2009        5           14          12          880

    2                     2009        5           14          13          4379

    2                     2009        5           14          14          152

    2                     2009        5           14          15          1478

    2                     2009        5           14          20          161

    2                     2009        5           14          21          145

    2                     2009        5           15          6           1700

    2                     2009        5           15          7           3672

    2                     2009        5           15          8           6266

    2                     2009        5           15          9           329

    2                     2009        5           15          10          5678715

    2                     2009        5           15          11          5637959

    2                     2009        5           15          12          5281732

    2                     2009        5           15          13          5020950

    2                     2009        5           15          14          1252

    2                     2009        5           16          11          732

    2                     2009        5           16          12          178

    2                     2009        5           16          13          725

    2                     2009        5           16          14          186

    2                     2009        5           16          16          72

     

     

    Resolution:

    Large batch of transactions require heavy IO requirements by the Distribution Agent Reader-Thread on the distribution database.  Fast disk subsystem with Transaction log and database on separate drives/LUNs may help improve IO performance. If this will be an ongoing pattern consider replicating the stored procedure EXECUTION instead of the RESULTS.

     

    Publishing Stored Procedure Execution in Transactional Replication

    http://msdn.microsoft.com/en-us/library/ms152754(SQL.90).aspx

     

     

    Troubleshooting Reader Thread: Distribution Agent latency in the Reader Thread occurs when retrieving rows from the DistributionDB. Obtain Agent history or -OUTPUT and examine the Agent stats which are generated every 5 minutes.

     

    ************************ STATISTICS SINCE AGENT STARTED ***********************

    03-22-2009 09:55:19

     

    Total Run Time (ms) : 59189906       Total Work Time  : 492741

    Total Num Trans     : 5     Num Trans/Sec    : 0.01

    Total Num Cmds      : 5     Num Cmds/Sec     : 0.01

    Total Idle Time     : 58660470

     

    Writer Thread Stats

      Total Number of Retries   : 0

      Time Spent on Exec        : 0

      Time Spent on Commits (ms): 16     Commits/Sec         : 0.05

      Time to Apply Cmds (ms)   : 0      Cmds/Sec            : 0.00

      Time Cmd Queue Empty (ms) : 528717         Empty Q Waits > 10ms: 5

      Total Time Request Blk(ms): 59189187

      P2P Work Time (ms)        : 0      P2P Cmds Skipped    : 0

     

    Reader Thread Stats

      Calls to Retrieve Cmds    : 11738

      Time to Retrieve Cmds (ms): 492741         Cmds/Sec            : 0.01

      Time Cmd Queue Full (ms)  : 0      Full Q Waits > 10ms : 0

    *******************************************************************************

     

    This -OUTPUT taken from the Distribution Agent log shows 100% of the duration was spent read rows from the Distribution database before the Agent was stopped.

     

    The Agent Log also records time when events completed.  In the example below, 4 minutes elapsed between time the Distribution Agent queries the Subscriber for the starting transaction to replicate and time it retrieved a batch of transactions from the Distribution database. The latency is caused by Reader Threads are waiting for CMDs to be returned from the Distribution DB. 

     

     

    --Example from Distribution Agent -OUTPUT showing delay in READER thread.

    --Read for Subscription metadata completes

    2009-05-16 01:26:49.229 OLE DB Distributor 'SQL380': {call sp_MSget_subscription_guid(11)}

    --Four minutes later read of replication cmds completes.

    2009-05-16 01:29:31.401 sp_MSget_repl_commands timestamp  value is: 0x0x0000081a00000045000400000000

    --Reader threads are waiting for CMDs to be returned from the Distribution DB. 

     

    To test the READER execute times, run sp_MSget_repl_commands from Query Window, look for Blocking, table scans, timeouts, etc.

     

    1. From the -OUTPUT get the LSN for the long runing sp_MSget_repl_commands

     

             2009-05-16 01:29:31.401 sp_MSget_repl_commands timestamp  value is: 0x0x0000081a00000045000400000000

     

    2. Get the Agent_ID from the same log

     

    3. Execute sp_MSget_repl_commands query against Distribution DB

     

         sp_MSget_repl_commands @agent_id=11, @last_xact_seqno=0x0000081a00000045000400000000

     

     

    How long did this command take to execute?  Was it blocked?  Were they are large number of CMDs to be returned?

     

    You can also retrieve the last successful transaction from the subscriber’s database using query below. The transaction_timestamp value contains the Log Sequence Number (LSN) used for the sp_MSget_repl_commands stored procedure.

     

    USE <subscriber db>

    --Retrieve last successful Transaction

    --    Multiple rows per publication indicate parallel distribution streams

    SELECT publisher,publication, distribution_agent,transaction_timestamp

    FROM dbo.MSreplication_subscriptions

     

     

    For even more details, examine the execution plan using query below.

     

    USE distribution

    GO

     

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

     

    SET STATISTICS PROFILE ON

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

     

    sp_MSget_repl_commands @agent_id=11, @last_xact_seqno=0x0000081a00000045000400000000

     

    SET STATISTICS PROFILE OFF

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

     

    In SQL 2005/2008 we can pull the CACHED execution plans from the DMVs.

     

    --dm_exec_query_stats for sp_MSget_repl_commands

    -- by top total_worker_time

    SELECT TOP 25

      st.text, qp.query_plan,

        (qs.total_logical_reads/qs.execution_count) as avg_logical_reads,

        (qs.total_logical_writes/qs.execution_count) as avg_logical_writes,

        (qs.total_physical_reads/qs.execution_count) as avg_phys_reads,

      qs.*

    FROM sys.dm_exec_query_stats as qs

             CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

             CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp

    WHERE st.text like 'CREATE PROCEDURE sys.sp_MSget_repl_commands%'

      ORDER BY qs.total_worker_time DESC

    Go

     

    Possible Cause: Table Statistics

    Long running Reader-Thread query when transactions containing many small batches may also indicate Replication system tables need to have statistics updates.  Table statistics can become out-of-date if AutoUpdateStatistics has been disable for the Distribution database. 

     

    --Updated Replication table statistics 'Updated' date

    DBCC SHOW_STATISTICS(MSrepl_commands,ucMSrepl_commands)

    DBCC SHOW_STATISTICS(MSrepl_transactions,ucMSrepl_transactions)

     

    --Retrieve Distribution database settings

    --    Look for IsAutoUpdateStatistics

    sp_helpdb distribution

     

    --or (returns 1)

    select DATABASEPROPERTYEX('distribution','IsAutoUpdateStatistics')

     

     

    Resolution:

     

    --Updated Replication table statistics

    UPDATE STATISTICS MSrepl_transactions

       WITH FULLSCAN

     

    UPDATE STATISTICS MSrepl_commands

       WITH FULLSCAN

     

    Resolution:

    Using built-in SQL tools, verify Distribution Agent Reader-Thread is not being blocked at the Distribuiton database.

     

    Distribution Agent Writer Latency

    The Writer-Thread calls Replication created stored procedures to apply changes to the subscriber.  These sp’s rely on a user table’s unique index (usually primary key) to locate records to be updated or deleted.  Latency in the Writer thread is usually see in long-running execution time for these stored procedures.

     

    sp_MSIns_articlename

    sp_MSUpd_articlename

    sp_MSDel_articlename

     

     

    Possible Cause: Long execution times in the INS/UPD/DEL stores procedures used to apply transactions to the subscriber.

     

    In the sample output below, the Distribution Agent latency is occurring in the WRITER thread of the Distribution Agent Latency

     

    --dm_exec_query_stats for sp_MSupd (Subscriber Updates)

    -- by top total_worker_time

    SELECT TOP 250

      st.text, qp.query_plan,

        (qs.total_logical_reads/qs.execution_count) as avg_logical_reads,

        (qs.total_logical_writes/qs.execution_count) as avg_logical_writes,

        (qs.total_physical_reads/qs.execution_count) as avg_phys_reads,

      qs.*

    FROM sys.dm_exec_query_stats as qs

             CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st

             CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp

    WHERE st.text like '%sp_MSupd_dboemployee%'  -- sp_MSupd_dbo<table name>%'

      ORDER BY qs.total_worker_time DESC

     

    Go

     

    If the stored procedures have already been flushed from cache or SQL Server has been restarted, a Profile trace on the Subscriber taken during the problem will provide the summary execution data.  Once loaded by SQL Nexus or fn_ReadTrace the stored procedure below will summaries the stored procedures called the Distribution Agent.

     

    -- measure ALL spids for performance

    select hashid, count(*) as Num, sum(duration) as sumdur, sum(cpu) as sumcpu, sum(reads) as sumreads, sum(writes) as sumwrites, avg(duration) as avgdur, avg(cpu) as avgcpu, avg(reads) as avgreads, avg(writes) as avgwrites

    INTO ReadTrace.topCPUBatch

    FROM ReadTrace.tblBatches

    group by hashid

    order by sumcpu desc

    GO

     

    -- Get the text for the top CPU batches

    select b.*,a.NormText

    FROM ReadTrace.tblUniqueBatches a

    JOIN ReadTrace.topCPUBatch b

    ON a.hashid=b.hashid

    where NormText like '%SP_MSGET_REPL_COMMANDS%'

    --OR NormText like '%SP_MSHELP_DISTRIBUTION_AGENTID%'

    --OR NormText like '%SP_MSGET_SUBSCRIPTION_GUID%'

    OR NormText like '%SP_MSins%'

    OR NormText like '%SP_MSupd%'

    OR NormText like '%SP_MSdel%'

    order by sumcpu desc

     

    Example Output


    hashid               Num sumdur  sumcpu      sumreads    sumwrites   avgdur  avgcpu avgreads    avgwrites   NormText
    -------------------- --- ------- ----------- ----------- ----------- ------- ------ ----------- ----------- -------------------------------
    -5684440255897011116 69  966058  389         4071        0           14000   5      59          0           SP_MSGET_REPL_COMMANDS
    -2908468862153496396 69  1466258 204         5175        0           21250   2      75          0           SP_MSHELP_DISTRIBUTION_AGENTID
    -7220247869142477571 69  539588  31          1152        0           7820    0      16          0           SP_MSGET_SUBSCRIPTION_GUID

     

     

     

    Possible Cause:

    In the sample output below, the Distribution Agent latency is occurring in the WRITER thread of the Distribution Agent Latency

     

    ************************ STATISTICS SINCE AGENT STARTED ***********************

    Total Run Time (ms) : 18828  Total Work Time  : 14110

    Total Num Trans     : 52  Num Trans/Sec    : 3.69

    Total Num Cmds      : 52  Num Cmds/Sec     : 3.69

    Writer Thread Stats

      Time Spent on Exec        : 12063

      Time Spent on Commits (ms): 422  Commits/Sec         : 0.14

      Time to Apply Cmds (ms)   : 14110  Cmds/Sec            : 3.69

      Time Cmd Queue Empty (ms) : 671  Empty Q Waits > 10ms: 2

      Total Time Request Blk(ms): 671

    Reader Thread Stats

      Calls to Retrieve Cmds    : 2

      Time to Retrieve Cmds (ms): 92  Cmds/Sec            : 565.22

      Time Cmd Queue Full (ms)  : 5486  Full Q Waits > 10ms : 3

    *******************************************************************************

     

    In this output the READS completes in 92ms at a rate of 565 cmds/sec while the WRITES are taking 14,000ms( 14 seconds ) and only processing 3.69 cmds/sec.  This example clearly shows the WRITE are slower then reads.

     

    Possible Cause: User Defined Triggers: High values in the Log Reader Agent -OUTPUT log for “Write time(ms):” indicate bottleneck writing commands to the Distribution database.  Once possible cause are user defined triggers adding theFirst investigate blocking caused by other Replication Agent. 

     

     

     

    Possible Cause: Not Replicating SQL statements as “Parameters” 

     

    Obtain the create publication script and check ALL articles to see if the SQL statements are being sent as batch of 'parameters' statements.  The status = 8 indicates 'parameters' batching is not enabled.

     

        exec sp_addarticle @publication = ... @status = 24

    or

      select name, status from sysarticles

     

    Article status can be updated on-demand using the following Replication stored procedure.  This should be executed on all articles.

     

    EXEC sp_changearticle @publication = N'<pub name>', @article = N'<article name>',

                          @property = 'status',  @value = 'parameters'

     

    Capture Profile RPC:COMPLETE events on the Subscriber.  Look for the time difference between StartTime and EndTime and/or Duration.  Statements should be completing around 100ms.  Longer execution times for single row update/inserts may indicate blocking on the subscriber or user-defined triggers on the subscriber tables contributing to the high durations.

     

    Look at the time difference between when on RPC:COMPLETES and the next PRC:STARTING.  This is the "round trip" time back to the Distribution Agent to say that command completed and next commands is received.  Times difference larger then 500ms may indicate network latency. 

     

     

    Also examine time from when READER thread finished reading a batch of commands until these commands were applied at the Subscriber. Long times between these 2 events also indicate long WRITER thread. 

     

    --reader thread completed -OUTPUT entry

    2009-05-14 01:07:50.795 sp_MSget_repl_commands timestamp value is: 0x0x0000081a00000033000400000000

    --writeer thread commit -OUTPUT entry

    2009-05-14 01:13:50.011 Parameterized values for above command(s): {{N'ZZZZZZZZ', 1}, {N'XXXXXXXX', 2}, {N'YYYYYYYY', 3}}

     

    When troubleshooting long Writer Thread look focus on the time to execute the UPDATE/INSERT/DELETEs on the Subscriber.  Execution times can be found in either PSSDIAG/Profiler trace or from execution DMVs.

     

             Execution time (ms): 3401930
                   Work time (ms): 3401470
    Distribute Repl Cmds Time(ms): 3354520
                   Fetch time(ms): 3343920
    Repldone time(ms): 5520
                   Write time(ms):  140070
       Num Trans: 8  Num Trans/Sec: 0.002352
       Num Cmds: 477219  Num Cmds/Sec: 140.297871

     

     

     

     

     

  • Troubleshooting Slow Distribution Agent performance in SQL 2005/2008

    When troubleshooting SQL 2005/2008 Distribution Agent performance, first determine if the Distribution Agent Latency is from the Reader Thread retrieving rows from the DistributionDB or the Writer Thread writing to the subscriber.  Modify the Agent startup parameters to include the -OUTPUT with agent stats.

     

    ************************ STATISTICS SINCE AGENT STARTED ***********************
    05-01-2009 17:35:54

    Total Run Time (ms) : 18828  Total Work Time  : 14110
    Total Num Trans     : 52  Num Trans/Sec    : 3.69
    Total Num Cmds      : 52  Num Cmds/Sec     : 3.69
    Writer Thread Stats  Time Spent on Exec        : 12063
      Time Spent on Commits (ms): 422  Commits/Sec         : 0.14
     
    Time to Apply Cmds (ms)   : 14110  Cmds/Sec            : 3.69
      Time Cmd Queue Empty (ms) : 671  Empty Q Waits > 10ms: 2
    Reader Thread Stats
      Calls to Retrieve Cmds    : 2
     
    Time to Retrieve Cmds (ms): 92  Cmds/Sec            : 565.22
      Time Cmd Queue Full (ms)  : 5486  Full Q Waits > 10ms : 3
    *******************************************************************************

     

    In this output clearly the READS completes very fast while WRITES are consuming most of the time contributing to the high latency.

     

    What to check:

     

    1) Obtain the create publication script and check ALL articles to see if the SQL statements are being sent as batch of 'parameters' statements.  The status < 16 indicates 'parameters' batching is not enabled.  Default value is 16.

     

        exec sp_addarticle @publication = ... @status = 24

    or

       select name, status from sysarticles

     

    Article status can be updated on-demand using the following Replication stored procedure.  This should be executed on all articles.

     

        EXEC sp_changearticle @publication = N'<pub name>', @article = N'<article name>',

                                          @property = 'status',  @value = 'parameters'

     

    2) Capture Profile RPC:COMPLETE events on the Subscriber.  Look for the time difference between StartTime and EndTime and/or Duration.  Statements should be completing around 100ms.  Longer execution times for single row update/inserts may indicate blocking on the Subcriber or user-defined triggers on the subscriber tables contributing to the high durations.

     

    The default value is 16

  • How to create a SQL Server link server with a different name.

    If select SQL linked server in the GUI you can't specify a different name for the SQL Server.  You can only use the SQL host name as the Link Server name.
     
    Here is an example of creating a Linked Server via script to specify link server name myGX620 for the SQL Server host name GX620.
     
     

    /****** Object:  LinkedServer [myGX620]    Script Date: 04/07/2009 22:49:10 ******/

    EXEC master.dbo.sp_addlinkedserver

        @server = N'myGX620',

        @srvproduct=N'MSSQL',

        @provider=N'SQLNCLI',

        @provstr=N'PROVIDER=SQLOLEDB;SERVER=GX620'

     

    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'myGX620',

        @useself=N'False',@locallogin=NULL,@rmtuser=N'sa',@rmtpassword='P@ssword!'

  • Network Binding Warning while installing SQL 2008 on Windows 2008 Cluster

    We ran into this issue today on a customer's SQL 2008 cluster installation.  This cluster was configured with BASP Broadcom NIC Teaming driver.  This allows for redudent NICs using 1 IP address. We verified the binding order was TEAM then Heartbeat and confirmed there were not disabled or ghosted network adapers as per the following KB article

    955963 You receive a warning about the network binding order on the Setup Support Rules page when you install SQL Server 2008 in a failover cluster
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;955963

     As this was a WARNING and not a failure, we continued the setup which completed without any errors.

  • Example using Transactional Replication with Filestreams in SQL 2008

    I create the following example using Transactional Replication with Filestreams for SQL 2008 using a single server as Publisher/Distributor/Subscriber.

    The script performs the following:

    1) Creates the published and subscribers DBs in directories call c:\test1 and c:\test2 to contain
     database file and log
     on disk filestream storage of data inserted
     for the Publisher and Subscrber files.

    2) Creates table and inserts image data as a filestream file to be persisted on the disk in the filestream filegroup.

    3) Enables Publication for Transactional Replication and created the Publication.
             Change @subscriber = N'<your server name>' to your publisher/subscriber server name

    4) When synchronize, the file located in the filegroup is replicated to the subscriber.  Note that since the transaction log LSN number is used for the Create File command the LSN number on the publisher will be different then the number on the subscriber resulting in different filenames.

     

    /*

    SQL 2008 Example of Replicating Table containing Filestream data

    Chris Skorlinski  (thanks to Bob Dorr for stream example)

    Microsoft SQL Server Escalation Services

    Feb 2009

     

    NOTE: Global Search/Replace <YOUR SERVER NAME> with your SQL Server name.

     

    1) Create 2 directory call c:\test1 and c:\test2 to contain

          database file and log

          on disk filestream storage of data inserted

          for the Publisher and Subscrber files.

    2) Execute script to

          create publisher and subscribers databases

          insert record into filestream field which is then written to disk

          examine directory containing filestream file

          examine transaction log to confirm file name used by filestream

    3) Create a Merge or Transaction Replication publication

          push to subscriber database

          examine directory containing filestream file

          examine transaction log to confirm file name used by filestream

    */

     

    --Create Publisher database

    USE [master]

    GO

    CREATE DATABASE [Pub_db] ON  PRIMARY

    ( NAME = N'Pub_db_dat', FILENAME = N'c:\test1\Pub_db.mdf' , SIZE = 2304KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

     FILEGROUP [filestreamgroup1] CONTAINS FILESTREAM  DEFAULT

    ( NAME = N'Pub_db_filestreamgroup1', FILENAME = N'c:\test1\filestream1' )

     LOG ON

    ( NAME = N'Pub_log', FILENAME = N'c:\test1\Pub_db.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

    GO

     

    /*

      For demonstration purpose the database set to FULL RECOVERY

        and a Full and Log backups executed.

          This prevents the transaction log from being

          automatically truncated and removing the

          Filestream transaction.

    */

    USE master;

    ALTER DATABASE Pub_db SET RECOVERY FULL;

    GO

    -- Back up the Pub_db database to new media set (backup set 1).

    BACKUP DATABASE Pub_db

      TO DISK = 'C:\Test1\Pub_dbFullRM.bak'

      WITH FORMAT;

    GO

    --Create a routine log backup (backup set 2).

    BACKUP LOG Pub_db TO DISK = 'C:\Test1\Pub_dbFullRM.bak';

    GO

     

     

    --Create table to publish

    USE Pub_db

    GO

    CREATE TABLE [dbo].[student](

          [pkID] [uniqueidentifier] ROWGUIDCOL  NOT NULL UNIQUE,

          [fsTextFile] [varbinary](max) FILESTREAM  NULL,

          [picture] [varbinary](max) NULL)

     

    ALTER TABLE [dbo].[student] ADD  CONSTRAINT [PK_student] PRIMARY KEY CLUSTERED

    (

          [pkID] ASC

    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY] FILESTREAM_ON [filestreamgroup1]

    GO

     

    /*    Insert row containing

                Unique Identifier (Primary Key)

                Filestream to disk binary value 0x4368726973 = CONVERT(varbinary,'Chris')

                0xaaaaa... as binary to simlulate picture

    */

    INSERT into Pub_db..student values(NEWID(),CONVERT(varbinary,'Chris'),0xaaaaaaaaaaaaaa)

    GO

    --show newly inserted row

    SELECT * from Pub_db..student

    GO

    /*sample output

     

    pkID                                 fsTextFile                               picture

    ------------------------------------ ---------------------------------------- ----------------------------------------

    604A00C3-31CA-44C2-83E2-F8A75163E4B1 0x4368726973                             0xAAAAAAAAAAAAAA

     

    (1 row(s) affected)

    */

     

    /*

    Browse the c:\test1 directory to see the file created.

    The filename is on based on Transaction Log LSN (log sequence number)

    example:

          path to file C:\test1\filestream1\3ea4f478-5bea-4b4a-988d-321c3abea731\7201c6c2-e51e-4186-b736-8e1516b4252f

          file name: 00000013-00000089-0007

         

          WARNING: Never do this with real data!

                1) Copy the file to another directory

                2) Open the file with Notepad.exe

                3) Note it contains the inserted text.

     

    Next read the transaction log for the transaction

          which recorded the creation of the Filestream file

         

    To read portition of the log use WHERE clause of LSN or Create File Operation

          WHERE [Current LSN]= '00000013:00000089:0007'

          WHERE Operation = 'LOP_FS_DOWNLEVEL_OP'

    */

     

    USE Pub_db

    GO

    SELECT [Current LSN],Operation, [Transaction ID],Description

          FROM fn_dblog(null,null)

          WHERE Operation = 'LOP_FS_DOWNLEVEL_OP'

    GO

    /*

    Sample Output:

    [Current LSN] = 00000013:0000007e:0005

    [Operation] = LOP_FS_DOWNLEVEL_OP

    [Transaction ID] = 0000:000001ef

    [Description] = Operation CREATE;File Id 65537;Name 3ea4f478-5bea-4b4a-988d-321c3abea731\7201c6c2-e51e-4186-b736-8e1516b4252f\00000013-00000089-0007

    */

     

    --It is a bit cryptic, but you can view the entire insert transaction

    SELECT * from fn_dblog(null,null)

    WHERE [Transaction ID] Like '0000:000001ef'

     

    /*

    Sample Output:

     

    LOP_BEGIN_XACT

    HoBt 72057594038845440:ACQUIRE_LOCK_IX OBJECT: 11:2105058535:0 ;ACQUIRE_LOCK_IX PAGE: 11:1:159;ACQUIRE_LOCK_X KEY: 11:72057594038845440 (6102e18128b4)

     

    */

     

    --attempt to directly access the file

    SELECT CAST(fsTextFile as varchar(10)), fsTextFile.PathName() from Pub_db..student

    GO

    /*

    Msg 5592, Level 16, State 3, Line 2

    FILESTREAM feature doesn't have file system access enabled.

     

    Configure Filestream to allow file access to the filestream object

     

          sp_configure 'filestream access level',2

          RECONFIGURE WITH OVERRIDE

     

          How to: Enable FILESTREAM

          http://msdn.microsoft.com/en-us/library/cc645923.aspx

    */

    sp_configure 'filestream access level',2

    RECONFIGURE WITH OVERRIDE

     

    --Try again to directly access the file

    SELECT pkID,CAST(fsTextFile as varchar(10)) as '1st 10 characters',

          fsTextFile.PathName() as 'PathName' from Pub_db..student

    GO

     

    /*

    Sample Output:

    pkID                                 1st 10 characters PathName

    ------------------------------------ ----------------- ------------------------------------------------------------------------------------------------

    604A00C3-31CA-44C2-83E2-F8A75163E4B1 Chris             \\<YOUR SERVER NAME>\MSSQLSERVER\v1\Pub_db\dbo\student\fsTextFile\604A00C3-31CA-44C2-83E2-F8A75163E4B1

     

    (1 row(s) affected)

    */

     

    --Create Subscriber database

    USE [master]

    GO

    CREATE DATABASE [Sub_db] ON  PRIMARY

    ( NAME = N'Sub_db_dat2', FILENAME = N'c:\test2\Sub_db.mdf' , SIZE = 1280KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

     FILEGROUP [filestreamgroup1] CONTAINS FILESTREAM  DEFAULT

    ( NAME = N'Sub_db_filestreamgroup1', FILENAME = N'c:\test2\filestream1' )

     LOG ON

    ( NAME = N'Sub_db_log2', FILENAME = N'c:\test2\Sub_db.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

    GO

     

    /*

      For demonstration purpose the database set to FULL RECOVERY

        and a Full and Log backups executed.

          This prevents the transaction log from being

          automatically truncated and removing the

          Filestream transaction.

    */

    USE master;

    ALTER DATABASE Sub_db SET RECOVERY FULL;

    GO

    -- Back up the Sub_db database to new media set (backup set 1).

    BACKUP DATABASE Sub_db

      TO DISK = 'C:\Test2\Sub_dbFullRM.bak'

      WITH FORMAT;

    GO

    --Create a routine log backup (backup set 2).

    BACKUP LOG Sub_db TO DISK = 'C:\Test2\Sub_dbFullRM.bak';

    GO

     

    /*

    Using script below create Transactional Publication

          Subscriber (Push or Pull)

    */

     

    -- Enable Publication for Transactional Replication

    use [Pub_db]

    exec sp_replicationdboption @dbname = N'Pub_db', @optname = N'publish', @value = N'true'

    GO

    -- Adding the transactional publication

    exec sp_addpublication @publication = N'FileStreamTranPublication', @description = N'Transactional publication of database ''Pub_db'' from Publisher ''<YOUR SERVER NAME>''.',

          @sync_method = N'concurrent', @retention = 0, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false',

          @snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous', @allow_subscription_copy = N'false', @add_to_active_directory = N'false',

          @repl_freq = N'continuous', @status = N'active', @independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'false', @autogen_sync_procs = N'false',

          @allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1, @allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false', @enabled_for_het_sub = N'false'

    GO

    exec sp_addpublication_snapshot @publication = N'FileStreamTranPublication', @frequency_type = 1, @frequency_interval = 0, @frequency_relative_interval = 0, @frequency_recurrence_factor = 0,

          @frequency_subday = 0, @frequency_subday_interval = 0, @active_start_time_of_day = 0, @active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @job_login = null, @job_password = null, @publisher_security_mode = 1

    GO

    /*

    Default schema_option = 0x000000000803509F for Transactional Replication

          does not replicate Filestream as files but as embedded objects.

          To replicated as files enable schema option 0x100000000

         

          0x00803509F (Tran)

          0x100000000 (BOL FileStreams)

     

    Use:

     

          @schema_option = 0x000000010803509F,

     

    */

    exec sp_addarticle @publication = N'FileStreamTranPublication', @article = N'student', @source_owner = N'dbo', @source_object = N'student', @type = N'logbased', @description = null, @creation_script = null, @pre_creation_cmd = N'drop',

          @schema_option = 0x000000010803509F, @identityrangemanagementoption = N'manual', @destination_table = N'student', @destination_owner = N'dbo', @vertical_partition = N'false', @ins_cmd = N'CALL sp_MSins_dbostudent', @del_cmd = N'CALL sp_MSdel_dbostudent', @upd_cmd = N'SCALL sp_MSupd_dbostudent'

    GO

     

    --Edit @subscriber name as needed

    exec sp_addsubscription @publication = N'FileStreamTranPublication',

    @subscriber = N'<YOUR SERVER NAME>', @destination_db = N'Sub_db', @subscription_type = N'Push', @sync_type = N'automatic', @article = N'all', @update_mode = N'read only', @subscriber_type = 0

    GO

    exec sp_addpushsubscription_agent @publication = N'FileStreamTranPublication', @subscriber = N'<YOUR SERVER NAME>', @subscriber_db = N'Sub_db', @job_login = null, @job_password = null, @subscriber_security_mode = 1, @frequency_type = 1, @frequency_interval = 0,

          @frequency_relative_interval = 0, @frequency_recurrence_factor = 0, @frequency_subday = 0, @frequency_subday_interval = 0, @active_start_time_of_day = 0, @active_end_time_of_day = 0, @active_start_date = 0, @active_end_date = 19950101, @enabled_for_syncmgr = N'False', @dts_package_location = N'Distributor'

    GO

     

    /*

    Using Managment Studio or Replication Monitor synchronize subscriber

          Run Snapshot Agent

          Run Distribution Agent

    */

     

    --SELECT rows on Subscriber to confirm sync completed

    USE Sub_db

    GO

    SELECT pkID,CAST(fsTextFile as varchar(10)) as '1st 10 characters',

          fsTextFile.PathName() as 'PathName' from Pub_db..student

     

    GO

    /*

    Sample Output:

    pkID                                 1st 10 characters PathName

    ------------------------------------ ----------------- ------------------------------------------------------------------------------------------------

    604A00C3-31CA-44C2-83E2-F8A75163E4B1 Chris             \\<YOUR SERVER NAME>\MSSQLSERVER\v1\Pub_db\dbo\student\fsTextFile\604A00C3-31CA-44C2-83E2-F8A75163E4B1

     

    (1 row(s) affected)

    */

     

    /*

    Browse the c:\test2 directory to see the file created.

    The filename is on based on Transaction Log LSN (log sequence number)

    example:

          path to file C:\test2\filestream1\8debcb27-1938-4f59-ad42-c693ba53bd83\3a8fc016-bfc7-4ea5-944b-090da45fca96

          file name: 00000014-0000007d-0008

               

          WARNING: Never do this with real data!

                Copy the file to another directory

                Open the file with Notepad.exe

                Note it contains the inserted text.

    */

     

    --Read transaction log on subscriber

    SELECT [Current LSN],Operation, [Transaction ID],Description

          FROM fn_dblog(null,null)

          WHERE Operation = 'LOP_FS_DOWNLEVEL_OP'

    GO

     

    /*

    Sample Output:

    [Current LSN] = 00000014:0000007d:0008

    [Operation] = LOP_FS_DOWNLEVEL_OP

    [Transaction ID] = 0000:0000024f

    [Description] = Operation CREATE;File Id 65537;Name 8debcb27-1938-4f59-ad42-c693ba53bd83\3a8fc016-bfc7-4ea5-944b-090da45fca96\00000014-0000007d-0008

     

    */

     

    /*

    --Execute the cleanup script below to remove

    --    Pub_db and Sub_db

     

     USE Pub_db

    -- Dropping the transactional subscriptions

    exec sp_dropsubscription @publication = N'FileStreamTranPublication',

    @subscriber = N'<YOUR SERVER NAME>', @destination_db = N'Sub_db', @article = N'all'

    GO

    -- Dropping the transactional publication

    exec sp_droppublication @publication = N'FileStreamTranPublication'

    GO

     

    USE master

    GO

    sp_configure 'filestream access level',1

    RECONFIGURE WITH OVERRIDE

    GO

     

    exec sp_replicationdboption @dbname = N'Pub_db', @optname = N'publish', @value = N'false'

    GO

    Drop database Pub_db

    GO

    DROP DATABASE Sub_db

    GO

     

    */

     

  • Troubleshooting Error: 18452 Login failed for user

    Error: 18452 Login failed for user ‘null‘, …

              “Null” or ‘’ means that client windows token is not trusted

              Kerberos authentication is not available, fallback to NTLM (not in a domain, SPN not registered)

              Solution: Create the same account, password on the SQL Server

              Delegation may be required (at least one hop exists)

              Solution: use SQL Authentication or enable delegation

              SQL Server service account cannot query domain

     

    Error: 18456 Login failed for user ‘name‘

              ‘Name’ means that principal is recognized, but not authorized to connect

              “Login Failed for user 'NT Authority\ANONYMOUS' LOGON“

              Reason: client is running under LocalSystem connects using NTLM

              Solution: Create a dedicated account on both machines

              “Login failed for user '<domain>\<machinename>$' “

              Reason: client is LocalSystem or NetworkService

              Solution: create login in SQL Server for client machine

              “Login failed for user ‘machinename\User’“

              Reason: client is local user and has no access

              Solution: create login in SQL Server for ‘machinename\User’

              “Login failed for user ‘user’“ (no “\”)

              Reason: SQL Login cannot connect

              Solution: Enable mixed mode

              Note that ‘user’ can be ‘null’ if client windows machine is not trusted, i.e. even SQL authentication requires cross-NTbox windows authentication

              SQL authentication can often be used as universal or the last resort solution to all these problems

     

    Error: 18456, Severity: 14, State: XX.

              Client is recognized, but not authenticated

              State: XX is not reported back to the client

              Written into ERRORLOG

     

    State

    Reason

    1

    Generic error

    2

    Login for remote login doesn’t exist in the server

    3,4

    Failed to decrypt encrypted in-memory password

    5

    SQL login doesn’t exist in the server

    6

    Connecting SQL login has mismatched login type catalog

    7

    Login disabled

    8

    Password does not match

    9

    Password is invalid (when changing password at logon)

    10

    Account policy validation failed

    11

    Windows login is not authorized to connect

    12

    Sql login is not authorized to connect

    13

    Server has paused, no logins are allowed

    14

    Login cannot be used with this type of connection interface

    15

    Database specified in the connection string is invalid

    16

    Default database is invalid

    17

    Default language is invalid

    18

    Password cannot be changed for this type of login

    19

    Failed to decrypt encrypted in-memory new password

     

    see also blogs.msdn.com/sql_protocols/archive/2006/12/02/understanding-kerberos-and-ntlm-authentication-in-sql-server-connections.aspx

     

  • Do we need to run UPDATE STATISTICS WITH FULLSCAN

     

    Auto-Update Statistics

     

    For a large majority of SQL Server installations, the most important best practice is to use auto create and auto update statistics database-wide. Auto create and auto update statistics are on by default. If you observe bad plans and suspect that missing or out of date statistics are at fault, verify that auto create and auto update statistics are on. If enable, then check when last updated.  If recently updated, then run-running with FULL SCAN will probably take hours to run and NOT improve query performance.

     

    Question:  Do we really need to troubleshooting Query performance by running UPDATE STATS with FULL SCAN when AUTO UPDATE STATS is enable?

     

    Answer: Probably Not!

     

    1) Use sp_helpdb <database> to see if AUTO STATS are enabled.

     

        sp_helpdb <db name>

        >>>>>IsAutoUpdateStatistics   

     

     

    2) If AUTO is enable, verify when STATS were last updated.

     

    DBCC show_statistics ('Shippers', 'PK_Shippers') WITH STAT_HEADER

     

    Name Updated

    ------------------------------ --------------------

    PK_Shippers Jul 16 2008 11:47AM

     

    select stats_date(object_id('Shippers'),1)

    -----------------------

    2008-07-16 11:47:34.697

     

     

     “Statistics are a histogram consisting of an even sampling of values for the index key (or the first column of the key for a composite index) based on the current data. The histogram is stored in the statblob field of the sysindexes table, which is of type image."  -- Kalen Delaney

     

    If STATS were recently updated, then FULL SCAN may take hours to perform on a large table and may not return any more accurate statistics then already collected.  Also, the next time AUTO runs it will overwrite the all the statistics generated from the FULL SCAN.  Rememer, both FULL SCAN and AUTO STATS are updating the same data in sysindexes, so whomever runs last are stats used by the Query Optimizer.

     

    Rule of thumb!

    --------------------

    Use Update Stats with full scan and disable Auto Stats when doing large batch updates and minimal to none ongoing updates.

    Use Auto Stats (default) and don't run UPDATE STATS Full Scan when data is modified throughout the day.

     

    Additional Information

    -------------------------------

    In SQL 2000 Auto Update stats monitors the rowmodctr value of sysindexes which records number of data changes. If the number of data changes exceeds a threshold the statistics are updated. When Statistics are updated this value is reset to 0. When statistics are update, Query Optimizer will load a new query plan. When the Optimizer threshold is not reach but data modifications have occurred, Update Statistics (FULL or Partial) will force new statistics and new query plan.

     

    In SQL 2005/2008 changes are now tracked by column.  The rowmodctr provides overall indication of data changes to a table/index but it alone doesn't trigger update stats.

     

    An alternative to enabling auto create statistics is enabled or make sure to manually create statistics using CREATE STATISTICS or sp_createstats.  Note that auto-statistics will not work for read-only databases.

     

     

    This info from ianjo's blog (http://blogs.msdn.com/ianjo/archive/2005/11/10/491549.aspx) and from the “Inside SQL Server 2000" book by Kalen Delaney.

     

    --Chris Skorlinski

    --Microsoft SQL Server Escalation Services

     

  • Troubleshooting Transactional Replication Distribution Agent Latency

    Here is a tip I found while troubleshooting a Distribution Agent latency problem.  We were trying to see why on some days the Distribution Agent was "getting behind".  We suspected it was volume related.  Below are a few queries to help uncover workload in the Distribution database.

    The query below returns count of transactions.  It's not fancy, but it helps identify days with higher then average counts.

    --Return count of transactions by day by publication

    select datepart(dayofyear,entry_time) as 'day of year',

    publisher_database_id as 'pub db id',

    count(*) as 'count of trans' from MSrepl_transactions

    where entry_time < getdate() - 5 --count trans older then 5 days

    group by datepart(dayofyear,entry_time), publisher_database_id

    order by datepart(dayofyear,entry_time), publisher_database_id

     

    day of year pub db id   count of trans

    ----------- ----------- --------------

    302         7           2291

    303         7           319369

    304         7           420476

    305         7           4633313

    306         7           411236

    307         7           459866

    308         7           755705

    309         7           937608

    310         1           57326

    310         2           18950

    310         7           462553

    310         8           122002

    311         1           38203

    311         2           24254

    311         7           257801

    311         8           105436

    --Chris Skorlinski

  • Query to return list of Dynamic Management Views

    I don't always remember names of the SQL Server Dynamic Managment views.  Books Online is great, but here is a simple query to get a list of DMVs

     

    SELECT * FROM sys.all_objects

       WHERE [name] LIKE '%dm_%'

                    AND [type] IN ('V', 'TF', 'IF')

                    AND [schema_id] = 4

    ORDER BY [name]

     

  • Performance Dashboard error "Difference of two datetime values caused an overflow at runtime"

     

    Have you tried using SQL Server Performance Dashboard?  We're using this tool on SQL Server Escalation team with customers to troubleshoot real-time performance problems.  It is a great way to quickly find queries with high-CPU, high-reads, waits, etc.

     

    SQL Server 2005 Performance Dashboard Reports

    http://www.microsoft.com/downloads/details.aspx?FamilyID=1d3a4a0d-7e0c-4730-8204-e419218c1efc&DisplayLang=en

     

    We discovered a problem with one of the stored procedure when it calculates idle_connection_time for connections > 24 days.  The revised stored procedure usp_Main_GetSessionInfo corrects this problem.

     

    USE [msdb]

    GO

    /****** Object:  StoredProcedure [MS_PerfDashboard].[usp_Main_GetSessionInfo]    Script Date: 06/19/2008 15:35:54 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

     

    ALTER procedure [MS_PerfDashboard].[usp_Main_GetSessionInfo]

    as

    begin

          select count(*) as num_sessions,

                sum(convert(bigint, s.total_elapsed_time)) as total_elapsed_time,

                sum(convert(bigint, s.cpu_time)) as cpu_time,

                sum(convert(bigint, s.total_elapsed_time)) - sum(convert(bigint, s.cpu_time)) as wait_time,

                --sum(convert(bigint, datediff(ms, login_time, getdate()))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time,

                --FIX to correct for sessions connected >24 days

                sum(convert(bigint, CAST ( DATEDIFF ( minute, login_time, getdate()) AS BIGINT)*60000 + DATEDIFF ( millisecond, DATEADD ( minute, DATEDIFF ( minute, login_time, getdate() ), login_time ),getdate() ))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time,

                case when sum(s.logical_reads) > 0 then (sum(s.logical_reads) - isnull(sum(s.reads), 0)) / convert(float, sum(s.logical_reads))

                      else NULL

                      end as cache_hit_ratio

          from sys.dm_exec_sessions s

          where s.is_user_process = 0x1

    end

     

     

  • I/O requests taking longer than 15 seconds to complete on file

    What does the “I/O request” error below represent?

     

    2008-04-21 13:26:42.480 spid364      Microsoft SQL Server 2005 - 9.00.3177.00 (Intel X86)

    2008-04-22 16:30:02.140 spid6s       SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\sql data files\xxx.MDF] in database [xxx] (5). 

    2008-04-22 16:32:08.780 spid6s       SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [H:\sql data files\xxx_data_4.NDF] in database [xxx] (5). 

     

    Research and Findings

    ---------------------------------

    These errors may occur as result of “CPU Drift” and can be ignored and disabled, howerver, first verify both SQL DMV sys.dm_io_pending_io_requests and Windows Performance counters don’t indicate any IO delays.

     

    On computers with multiple CPUs, the CPUs are designed to “sleep” during periods of low workload.  When CPUs sleep, SQL may not accurately determine CPUs overall workload and incorrectly report this as IO WARNING shown above, however, this does not represent an actual CPU performance problem.

     

    To confirm if the CPUs entered low-power state, SQL Server Escalation Services created RDTSC (Read Time Stamp Counter) utility to report total CPU sleep time.  The report confirmed the CPUs were sleeping up to 24 seconds.  This would be enough for SQL Server to incorrectly report this a slow IO.  Both AMD’s and INTEL’s web sites describe the ability for CPUs to sleep.

     

    RDTSCTest.exe [-md|-mt]

          -md   Detailed output (default)

          -mt   CPU speeds in table format

     

    -- Current CPU Speeds --

    Runtime              CPU  ExpectedMHz ActualMHz RDTSCTicks           DriftMS         

    -------------------- ---- ----------- --------- -------------------- ----------------

    2008-04-22 17:53:36     0        3502      3503 0x0001564772F87FA72    16123.4

    2008-04-22 17:53:36     8        3502      3506 0x00015647D8B7AE21D    23922.5

    2008-04-22 17:53:36    16        3502      3507 0x00015647B5FEB4A39    21260.9

     

    For more information on  RDTSC can be found at

    http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx

     

    Starting with SQL 2005 SP2 we’ve included two trace flags to disable the reporting of CPU Drift errors in the SQL Server errorlog. 

     

    Disable this error using Trace Flag 8033

    The time stamp counter of CPU on scheduler id 1 is not synchronized with other CPUs.

     

    Disable this error using Trace Flag 830

    SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete

     

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker