SQL in Stockholm

SQL Server thoughts from the support team in Stockholm

SQL Server troubleshooting thoughts from Graham Kent, (ex) Senior Support Escalation Engineer at Microsoft Sweden.

  • SQL in Stockholm

    Slow recovery times and slow performance due to large numbers of Virtual Log Files

    • 1 Comments

    In my previous post about database mirroring configuration problems caused by large numbers of virtual log files, I mentioned that such a scenario can have other unwanted side effects, with or without database mirroring being enabled. Here’s an example that demonstrates one of these scenarios. This shows how recovery times after a disaster can be significantly extended by allowing your log file to grow in small increments, thereby creating thousands of VLFs. It’s worth noting as well that this scenario also demonstrates the performance impact of doing online file growth during periods of large activity. This scenario can affect you with or without mirroring enabled, it’s just quicker to demonstrate with configuring mirroring.

    I mentioned before that to fully understand this optic you need to be familiar with Transaction Log Physical Architecture and also Factors That Can Delay Log Truncation.

    There is one simple lesson to be learnt here, so I’ll write it early on in the article:

    Pre-grow your log files (and your data files) in advance. Avoid repeated online auto-growth if at all possible.

    Consider the following example scripts which are based upon the same routines as used in the previous post, but with some small changes.

    These scripts do the following actions, one for a good scenario and one for the bad scenario.

    1. Create a database named VLF

    2. Create a starting backup point

    3. Open a transaction in this database to ensure that the transaction cannot be truncated

    4. Fill up the log with thousands of transactions until the log reaches the space limit set for it – in this case we test with a 5GB log. The test of the log running out of space is used just to guarantee log files of equivalent sizes for the recovery time comparison. This situation holds true for logs that haven’t run out of space.

    5. Restart the server without taking a backup or dealing with the log space. This forces SQL to recover the entire contents of the active portion of the log, which is in effect, all of it.

    We then just start the SQL Server service again and monitor the recovery times and look in the SQL Server error logs.

    We notice a number of interesting points here:

    1. The runtime required to fill up the 5GB log files is significantly faster when using the pre-grown log file. Less than 2 minutes for the pre-grown log file compared to over 9 minutes for the incrementally auto grown file. This demonstrates why auto-growths can be bad. They are expensive to manage and can harm your performance. The fact that the log fills quicker indicates that you are managing to throughput the transactions faster.

    2. There is a delay of about 2 minutes when nothing is logged about the recovery phases of the VLF database in the bad scenario. This is the period in between the database starting up and the analysis phase beginning. In reality there is a large amount of work going on here; SQL Server is processing the 20000 VLFs that are present in the log file. This delay can be extended in proportion to the number of VLFs that you allow your log file to hold. I have seen real world examples where hundreds of thousands of VLFs have significantly delayed a recovery time.

    3. The actual recovery times are similar (about 2 minutes) once the recovery actually begins. These recovery times are this length due to the fact that we created the old open transaction in the test meaning that SQL is forced to recover the entirety of the log.

    4. As well as being significantly faster to fill up the log (in terms of runtime) which indicates that the processing of transactions is faster, when we look at the log in detail we see that we have processed about 6500 more transactions in the shorter period. Further evidence of how much better performance the pre-grown log files gives you. We’ve managed to process 25% more transactions in a quarter of the time!

    The scripts and error log output are shown below. The scripts use the undocumented DBCC LOGINFO command to examine the number of virtual log files in the physical log. Remember as with all undocumented commands, use of it is not supported, and you use it at your own risk. The number of rows that this command outputs represent the number of VLFs currently in the physical log.

    SCRIPT 1 – Bad scenario

    --CONNECT to the test server instance

     

    use master

    go

     

    if exists (select name from sys.databases where name = 'vlf')

    begin

    drop database vlf --(if you need to)

    end

     

     

    --create a database on the root of C with a small starting log size

    --which then grows in small increments

    create database vlf

    on (name=vlf, filename='c:\vlf.mdf')

    log on (name=vlf_log, filename='c:\vlf_log.ldf',size=1MB,maxsize=5GB,filegrowth=1MB)

    go

     

    use vlf

    go

     

    --create a dummy table to fill up

    create table vlf_test

    (

    a int identity(1,1) not null,

    b char(8000) not null,

    c int not null

    )

    go

     

    insert vlf_test values ('vlftest',0)

    go

     

    --create a starting point of backups

    backup database vlf to disk = 'c:\vlf.bak' with stats = 1

    go

    backup log vlf to disk = 'c:\vlf1.bak'

    go

     

    --open a 2nd new connection and open an explicit uncommitted transaction

    --to ensure that the transaction log cannot truncate

     

    use vlf

    go

     

    begin tran

    create table opentran

    (

    opentran int

    )

    Go

     

    --switch back to the first connection

    --now run the following to grow the log to create lots of VLFs

    --run until the log runs out of space - a convenient stopping point

     

    use vlf

    go

    set nocount on

    go

     

    declare @x int

    declare @a char(8000)

    declare @b char(8000)

    set @a = replicate('a',8000)

    set @b = replicate('b',8000)

    set @x = 1

    while 1=1

    begin

    update vlf_test set b = @b, c = @x where a = 1

    update vlf_test set b = @a where a = 1

    set @x = @x + 2

    end

    go

     

    --let this run until the log runs out of space

    --on the 2nd connection, use dbcc loginfo to check how many VLFs you have

     

    DBCC LOGINFO (vlf)

     

    --when you have reached 5GB of log file with several thousands VLFs

    --simulate a server disaster by cutting the power to the principal server

    --alternatively you could just power off your server ungracefully

     

    shutdown with nowait

     

    --now restart the SQL service or power on the server

    I get the following results from the error log when I run this on my test server.

    Runtime of filling the log: 9 minutes 21 seconds

    Number of VLFs when log full: 20480

    Time for VLF to recover fully: 3 minutes 40 seconds

    Excerpts from the SQL Server log:

    Server starts:

    2008-05-16 13:22:40.85 Server Microsoft SQL Server 2005 - 9.00.3186.00 (X64)

    VLF db starts up

    2008-05-16 13:22:47.97 spid18s Starting up database 'vlf'.

    2 minute pause whilst nothing is logged about vlf database, then analysis starts. It is during this period that SQL Server is processing the large number of VLFs.

    2008-05-16 13:24:34.28 spid18s Analysis of database 'vlf' (19) is 3% complete (approximately 8 seconds remain). This is an informational message only. No user action is required.

    Redo has started

    2008-05-16 13:24:45.80 spid18s Recovery of database 'vlf' (19) is 3% complete (approximately 108 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

    Note how many transactions have been rolled forward – this is how many we did in the test

    2008-05-16 13:26:21.36 spid18s 26533 transactions rolled forward in database 'vlf' (19). This is an informational message only. No user action is required.

    Undo has started

    2008-05-16 13:26:21.39 spid18s Recovery of database 'vlf' (19) is 99% complete (approximately 0 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

    Recovery complete

    2008-05-16 13:26:21.49 spid5s Recovery is complete. This is an informational message only. No user action is required.

    SCRIPT 2 – Good scenario

    The script is identical to the bad scenario shown above, apart from the simple fact that we create the log to a pre-expand size of 5GB, using this command:

    create database vlf

    on (name=vlf, filename='c:\vlf.mdf')

    log on (name=vlf_log, filename='c:\vlf_log.ldf',size=5GB,maxsize=5GB,filegrowth=1MB)

    This gives the following results on the identical server.

    Runtime of filling the log: 1 minutes 54 seconds

    Number of VLFs when log full: 16

    Time for VLF to recover fully: 2 minutes 2 seconds

    Excerpts from the SQL Server log:

    Server starts:

    2008-05-16 13:40:16.10 Server Microsoft SQL Server 2005 - 9.00.3186.00 (X64)

    VLF db starts up

    2008-05-16 13:40:22.64 spid18s Starting up database 'vlf'.

    6 second pause whilst nothing is logged about vlf database, then analysis starts. It is during this period that SQL Server is processing the VLFs, but in this case because there are only 16, it completes much quicker. This is the fundamental difference between the examples.

    2008-05-16 13:40:28.99 spid18s Analysis of database 'vlf' (19) is 2% complete (approximately 157 seconds remain). This is an informational message only. No user action is required.

    Redo has started

    2008-05-16 13:40:44.22 spid18s Recovery of database 'vlf' (19) is 4% complete (approximately 106 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

    Note how many transactions have been rolled forward – this is how many we did in the test

    2008-05-16 13:42:18.69 spid18s 33216 transactions rolled forward in database 'vlf' (19). This is an informational message only. No user action is required.

    Undo has started

    2008-05-16 13:42:18.69 spid18s Recovery of database 'vlf' (19) is 99% complete (approximately 0 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

    Recovery complete

    2008-05-16 13:42:18.89 spid5s Recovery is complete. This is an informational message only. No user action is required.

    I hope this demo gives further weight to the case for pre growing your SQL Server files in advance. Feel free to post a comment if you have a question.

  • SQL in Stockholm

    1413 error when starting Database Mirroring – How many virtual log files is too many?

    • 1 Comments

    Recently in Cumulative update package 6 for SQL Server 2005 Service Pack 2 we released a fix for a problem where you receive a 1413 error when starting database mirroring. The error that you get in the looks like this:

    Error: 1413, Severity: 16, State: 1.

    Communications to the remote server instance 'TCP://ComputerName:PortNumber' failed before database mirroring was fully started. The ALTER DATABASE command failed. Retry the command when the remote database is started.

    In addition to this, if you look in the error logs themselves you will normally see the following 2 errors reported as well:

    Error: 1443, Severity: 16, State: 2.

    Database mirroring has been terminated for database 'xxx'. This is an informational message only. No user action is required.

    And you also note:

    Error: 1479, Severity: 16, State: 1.

    The mirroring connection to "TCP://ComputerName:PortNumber" has timed out for database "xxx" after 10 seconds without a response. Check the service and network connections.

    Although you can install the hotfix, I think it’s worth considering why this problem actually occurs and how you can potentially avoid it happening at all through good database management practices, which have benefits far and beyond the configuration of database mirroring.

    Hopefully you are familiar with the architecture of a SQL Server transaction log file. In principal the primary reason that this 1413 and its associated time out errors occur is that when you start database mirroring SQL Server needs to read through the transaction log file to bring the 2 databases into a synchronized state, following the restore of the last transaction log backup to the mirror partner. The time is takes to do this is dependent upon the number of virtual log files contained within the physical log file. This sometimes leads to confusion as a relatively small physical log file can contain thousands of virtual log files if it’s been badly managed. This in turn means that a relatively small physical log file (say 1GB) can be the cause of this 1413 error, and in turn the exact same database configured for mirroring on the exact same hardware will successfully set up when it has an identically sized physical log, but one that has been better managed and doesn’t contain so many virtual log files.

    To demonstrate this, try the following reproduction examples. To run these you’ll need 2 SQL Server instances with mirroring endpoints already set up (I’ll assume if you’re reading this far that you already understand this). A good place to try this repro is a test server environment where you already have a mirroring partnership configured. The repros use the undocumented DBCC LOGINFO command to examine the number of virtual log files in the physical log. Remember as with all undocumented commands, use of it is not supported, and you use it at your own risk.

    The example uses a common scenario where a log has been left to grow and runs out of space, but actually there are many different ways that you can cause your log to grow. Some are documented here. This is just a quick way to run the demo.

    Script 1: produces the 1413 failure – note the thousands of VLFs

    --CONNECT to the primary server insatnce

    use master

    go

     

    drop database vlf --(if you need to)

     

    --create a database on the root of C with a small starting log size

    --which then grows in small increments

    create database vlf

    on (name=vlf, filename='c:\vlf.mdf')

    log on (name=vlf_log, filename='c:\vlf_log.ldf',size=1MB,maxsize=1GB,filegrowth=1MB)

    go

     

    use vlf

    go

     

    --create a dummy table to fill up

    create table vlf_test

    (

    a int identity(1,1) not null,

    b char(8000) not null,

    c int not null

    )

    go

     

    insert vlf_test values ('vlftest',0)

    go

     

    --create a starting point of backups

    backup database vlf to disk = 'c:\vlf.bak' with stats = 1

    go

    backup log vlf to disk = 'c:\vlf1.bak'

    go

     

    set nocount on

    go

     

    --grow the log until it runs out of space

    declare @x int

    declare @a char(8000)

    declare @b char(8000)

    set @a = replicate('a',8000)

    set @b = replicate('b',8000)

    set @x = 1

    while 1=1

    begin

    update vlf_test set b = @b, c = @x where a = 1

    update vlf_test set b = @a where a = 1

    set @x = @x + 2

    end

    go

     

    --now alter the log size and backup the log

    alter database vlf modify file (name=vlf_log, maxsize=2GB)

    go

    backup log [vlf] to disk = 'c:\vlf2.bak' with stats=1

    go

     

    --use the undocumented DBCC LOGINFO command to check the number of VLFs in our

    dbcc loginfo (vlf)

    go

    --note the number of VLFs, it should be several thousand

    --for me it was 4100

     

    --connect to the other server which you wish to be the mirror

    --copy the 3 backup files across to it's C drive and run the following

    use master

    go

     

    restore database [vlf] from disk = 'c:\vlf.bak' with norecovery, replace, file=1,

    move 'vlf' to 'c:\vlf.mdf',

    move 'vlf_log' to 'c:\vlf_log.LDF'

    go

    restore log [vlf] from disk = 'c:\vlf1.bak' with norecovery

    go

    restore log [vlf] from disk = 'c:\vlf2.bak' with norecovery, stats=1

    go

     

    --you now have a pair of databases ready to have mirroring started

    --try to start database mirroring using the SSMS GUI and you will receive the 1413 error

    --the only exception to this might be if you run it on particularly fast hardware

    Script 2: works fine - identical except that the log has been pre-grown to the same size. This is good practice for all database files, whatever their type.

    --CONNECT to the primary server instance

    --remember to delete any old backup files of the same name from previous tests

    use master

    go

     

    drop database vlf --(if you need to)

     

    create database vlf

    on (name=vlf, filename='c:\vlf.mdf')

    log on (name=vlf_log, filename='c:\vlf_log.ldf',size=1GB,maxsize=1GB,filegrowth=250MB)

    go

     

    use vlf

    go

     

    create table vlf_test

    (

    a int identity(1,1) not null,

    b char(8000) not null,

    c int not null

    )

    go

     

    insert vlf_test values ('vlftest',0)

    go

    backup database vlf to disk = 'c:\vlf.bak' with stats = 1

    go

    backup log vlf to disk = 'c:\vlf1.bak'

    go

    set nocount on

    go

     

    --this will grow the log until it runs out of space

    declare @x int

    declare @a char(8000)

    declare @b char(8000)

    set @a = replicate('a',8000)

    set @b = replicate('b',8000)

    set @x = 1

    while 1=1

    begin

    update vlf_test set b = @b, c = @x where a = 1

    update vlf_test set b = @a where a = 1

    set @x = @x + 2

    end

    go

     

    --now alter the log size and backup the log

    alter database vlf modify file (name=vlf_log, maxsize=2GB)

    go

    backup log [vlf] to disk = 'c:\vlf2.bak' with stats=1

    go

     

    --use the undocumented DBCC LOGINFO command to check the number of VLFs in our

    dbcc loginfo (vlf)

    go

    --note the number of VLFs, it should be only about 16 or so

     

    --connect to the other server which you wish to be the mirror

    --copy the 3 back files across to it's C drive

     

    restore database [vlf] from disk = 'c:\vlf.bak' with norecovery, replace, file=1,

    move 'vlf' to 'c:\vlf.mdf',

    move 'vlf_log' to 'c:\vlf_log.LDF'

    go

    restore log [vlf] from disk = 'c:\vlf1.bak' with norecovery

    go

    restore log [vlf] from disk = 'c:\vlf2.bak' with norecovery, stats=1

    go

     

    --you now have a pair of databases ready to have mirroring started

    --try to start database mirroring using the SSMS GUI and it should work

    --the only exception to this might be if you run it on particularly slow hardware

    I ran my test on 2 small single processor servers. You might get slightly different results as the scripts note, depending upon your exact configuration. I used SQL Server 2005 build 3186 (CU#3) for this test.

    My main point here is that you should look to control your transaction log proactively. There are obviously times when this is not possible, or other mitigating circumstances arise, which is why we have released a hotfix to improve the speed with which the database mirroring configuration reads the log files. This improvement is also included in SQL Server 2008 as well. However there are other issues that you can encounter if you let you virtual log files grow unchecked, such as excessive recovery times, so as with all SQL Server file management tasks, proactive is best. We always recommend that our customers analyze their expected usage and pre-grow their database files.

    I hope this helps to explain the database mirroring error in more depth, but feel free to post a comment if you have a question.

Page 1 of 1 (2 items)