Bulk Import Optimizations (Minimal Logging)

Bulk Import Optimizations (Minimal Logging)

Rate This
  • Comments 13

In the previous blog I had listed all bulk import optimizations. Now, I will describe each of these optimizations in detail in separate blog entries. Let me first start with minimal logging. By the way minimal logging is also called bulk-logging but the correct term is 'minimal logging'.  Bulk logging refers more to the database recovery model that enables minimal logging. Note, that the minimal logging can also be enabled by SIMPLE recovery model as well. This BLOG decribes the behavior for minimal logging as it exists in SQL Server 2005.

 

Under minimal logging, the SQL Server does not log individual rows and logs only the page and extent allocations. So if each page can fit 100 rows, we are trading 100 log records for 1 log record, a significant improvement. It is useful here to step back and examine how SQL Server does ensures ACID properties even when individual rows are not logged or how does the log backup work.

 

When the bulk import transaction commits (i.e. the number of rows as specified in the batchsize), the SQL Server flushes all the data pages (a subset of these pages are likely to have been flushed as part of checkpoint) to disk to guarantee the atomocity of the transaction. This is not any more expensive than flushing the log records as the log records would contain the full image of all the rows. And if for some reason, the bulk import transaction needs to be rolled back, all SQL Server needs to do is to traverse the log records and mark the pages/extents to be free again. When the transaction log backup is taken, the SQL Server also backs up the data pages, including index pages when applicable, as part of the transaction log. This enables a database restore by restoring the database backup followed by zero or more transaction log backups. The only drawback of minimal logging is that it does not support point-in-time recovery. This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated.  One can argue that we could possibly do point-in-time recovery to just before the bulk import had started but it is not supported. If this is of concern, you can do transaction log backup before/after bulk operations. Please refer to BOL for details of point-in-time recovery and associated restrictions.

 

So far we discussed minimal logging at a conceptual level. Now let us consider a simple example where we will insert 5 rows into a table both with minimal logging and without it and examine the difference.

 

The table t_heap below has a row length of > 1000 bytes so the log record representing the an insert of the data row will also be > 1000 bytes so it will be easy to spot when we examie the log records

 

create table t_heap (c1 int, c2 int, c3 decimal (38,2), c4 char(1000))

 

Let us execute the following command to insert 5 rows through optimized Bulk Import

 

bulk insert t_heap

from 'C:\t_heap-c-small.dat' with (TABLOCK)

 

Now we can look at the top 5 log records by executing the following query

 

select 5 operation,context,[log record length]

from fn_dblog(null, null)

where allocunitname='dbo.t_heap'

order by [Log Record Length] Desc

 

Here are the top 5 log records returned. You will notice that all records are < 100 bytes. In other words, the individual row inserts were not logged. The log records below refer to PFS, an allocation meta-information related page that tracks free space in the allocated page.

Operation        context [log record length]

---------        ------- -------------------

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

 

 

Now let us take a look at the log records when we insert 5 rows through the normal insert code path as follows

declare @i int

select @i = 0

while (@i < 5)

begin

     insert into t_heap values

      (@i, @i + 5000, 10.2, replicate ('a', 60))

     set @i = @i + 1

end

                                                                    

And here are the top 5 log records and you can see that in this case, each inserted row is fully logged.

operation        context    [log record length]

---------        -------    -------------------

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

You can use fn_dblog() as a handy way to see if you are getting minimal logging or not.

Till now, we have discussed what does the minimal logging mean and how you can checks if you indeed are getting the minimal logging but we have not looked at the conditions that need to be met for minimal logging. The conditions for minimal logging are

·         The database recovery model must be set to BULK_LOGGED or SIMPLE. Under FULL recovery model, as the name implies, all operations are fully logged.

·         The target table should not have been marked for replication

·         If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged.

·         Minimal logging depends on the indexes defined on the table and whether the table/index is empty or not as described in the table below. You will notice that TABLOCK is required to enable minimal logging.

 

 

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • Hi,

    Interisting post.

    I've been trying to reproduce it with the script below, but with no success when I change the recovery model.

    USE AdventureWorks;

    CREATE TABLE t_heap (c1 INT, c2 INT, c3 DECIMAL (38,2), c4 CHAR(1000));

    -- HERE OK, I got the same result.

    BULK INSERT t_heap

    FROM 'C:\t_heap.dat' WITH (TABLOCK, FIELDTERMINATOR ='|')

    ------------------------------------------------------------------------------------------

    -- HERE OK, I got the same result.

    ALTER DATABASE AdventureWorks SET RECOVERY FULL

    DECLARE @I INT

    SELECT @I = 0

    WHILE (@I < 5)

    BEGIN

    INSERT INTO T_HEAP WITH (TABLOCK) VALUES

         (@I, @I + 5000, 10.2, REPLICATE ('A', 60))

    SET @I = @I + 1

    END

    SELECT TOP 5 operation,context,[log record length]

    FROM fn_dblog(null, null)

    WHERE allocunitname='dbo.t_heap'

    ORDER BY [Log Record Length] DESC

    Simple/Bulk recovery model I'm getting this result:

    LOP_INSERT_ROWS LCX_HEAP 1116

    ------------------------------------------------------------------------------------------

    ALTER DATABASE AdventureWorks SET RECOVERY SIMPLE

    -- repeat INSERT and SELECT above

    ------------------------------------------------------------------------------------------

    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED

    -- repeat INSERT and SELECT above

    My setup:

    Microsoft SQL Server 2005 - 9.00.3200.00 (Intel X86)

    Oct  2 2007 11:33:27

    Copyright (c) 1988-2005 Microsoft Corporation

    Developer Edition on Windows NT 6.0 (Build 6000: )

    If I got it right, the log record length result with SIMPLE/BULK recovery model should be 92 instead of 1116, right?

  • You don't get minimal logging with regular TSQL INSERT statement independent of the recovery model. However, we are looking into if we can remove this restriction for regular inserts in future releases including SQL2008

  • The big news of the week is Windows Server 2008 RTM'ing. Read more here. The storage engine team has begin a series on bulk insert. It is definitely recommended reading. Part 1, Part 2, Part 3. The S ...

  • Thanks for the answer.

    I was looking for an article talking about the differences among FULL, SIMPLE and RECOVERY model (which each one of this setting really write in the log file), but I haven't find it.

    Could you talk about this or blog about it?

  • BOL has a good description on different recovery model. If you think something is not well covered, please send me a mail and we will update the BOL. In a nutshell, under SIMPLE recovery model, SQL Server truncates the inactive portion of the log but otherwise the logging is similar to what we have under bilk-logging recovery model.

  • If you have a table with clustered index, and with existing data, say you have 30 million in the table. When you bulk import 10 million more, each one would be fully logged, correct?

    What's the best way to achieve minimal logging in this scenario, drop the clustered index first, then add the clustered index back after bulk import? For a large table, creating a clustered index might take a while. But is it still a better option than having a clustered index in place when importing?

  • I recommend reading the following white paper that dealt with exactly these kinds of questions. This is still applicable for SQL2005. I will follow up this thread with some changes we have made for minimal logging in SQL Server 2008.

    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

  • Hello Vishal

    It is a very good article.

    Can You Elabrate on the Below.I have picked the lines from your article.

    "This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated. "

  • The pages containing the minimally logged operations are backed up as part of log backup. When these log records are restored as part of restore, we don't have the UNDO information in case the changes in  these pages need to be undone.

  • Nice artice.....

    I have tried the above scripts ...But in the following order

    After creating the table .....I done  the Bulk Insert With( TabLock )...I can see more ...LOP_MODIFY_ROW        LCX_PFS  Records (am not using top 5 option.)

    Can you tell me why more LOP_MODIFY_ROW        LCX_PFS  records ?

  • Aljo,

    PFS pages need to be changed to track allocation/deallocation. Similarly GAM pages. Key thing about minimal logging is that we don't want to log the 'data row'

    thanks

    Sunil

  • Sunil,

    Thanks for clarifying this.

    Aljo

  • it is possible that logging  can be turned off temporarily to support bulk imports?  

Page 1 of 1 (13 items)