SQL Server Storage Engine

Bulk Import Optimizations (Minimal Logging)

In the previous blog I had listed all bulk import optimizations. Now, I will describe each of these optimizations in detail in separate blog entries. Let me first start with minimal logging. By the way minimal logging is also called bulk-logging but the correct term is 'minimal logging'.  Bulk logging refers more to the database recovery model that enables minimal logging. Note, that the minimal logging can also be enabled by SIMPLE recovery model as well. This BLOG decribes the behavior for minimal logging as it exists in SQL Server 2005.

 

Under minimal logging, the SQL Server does not log individual rows and logs only the page and extent allocations. So if each page can fit 100 rows, we are trading 100 log records for 1 log record, a significant improvement. It is useful here to step back and examine how SQL Server does ensures ACID properties even when individual rows are not logged or how does the log backup work.

 

When the bulk import transaction commits (i.e. the number of rows as specified in the batchsize), the SQL Server flushes all the data pages (a subset of these pages are likely to have been flushed as part of checkpoint) to disk to guarantee the atomocity of the transaction. This is not any more expensive than flushing the log records as the log records would contain the full image of all the rows. And if for some reason, the bulk import transaction needs to be rolled back, all SQL Server needs to do is to traverse the log records and mark the pages/extents to be free again. When the transaction log backup is taken, the SQL Server also backs up the data pages, including index pages when applicable, as part of the transaction log. This enables a database restore by restoring the database backup followed by zero or more transaction log backups. The only drawback of minimal logging is that it does not support point-in-time recovery. This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated.  One can argue that we could possibly do point-in-time recovery to just before the bulk import had started but it is not supported. If this is of concern, you can do transaction log backup before/after bulk operations. Please refer to BOL for details of point-in-time recovery and associated restrictions.

 

So far we discussed minimal logging at a conceptual level. Now let us consider a simple example where we will insert 5 rows into a table both with minimal logging and without it and examine the difference.

 

The table t_heap below has a row length of > 1000 bytes so the log record representing the an insert of the data row will also be > 1000 bytes so it will be easy to spot when we examie the log records

 

create table t_heap (c1 int, c2 int, c3 decimal (38,2), c4 char(1000))

 

Let us execute the following command to insert 5 rows through optimized Bulk Import

 

bulk insert t_heap

from 'C:\t_heap-c-small.dat' with (TABLOCK)

 

Now we can look at the top 5 log records by executing the following query

 

select 5 operation,context,[log record length]

from fn_dblog(null, null)

where allocunitname='dbo.t_heap'

order by [Log Record Length] Desc

 

Here are the top 5 log records returned. You will notice that all records are < 100 bytes. In other words, the individual row inserts were not logged. The log records below refer to PFS, an allocation meta-information related page that tracks free space in the allocated page.

Operation        context [log record length]

---------        ------- -------------------

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

LOP_MODIFY_ROW   LCX_PFS   92

 

 

Now let us take a look at the log records when we insert 5 rows through the normal insert code path as follows

declare @i int

select @i = 0

while (@i < 5)

begin

     insert into t_heap values

      (@i, @i + 5000, 10.2, replicate ('a', 60))

     set @i = @i + 1

end

                                                                    

And here are the top 5 log records and you can see that in this case, each inserted row is fully logged.

operation        context    [log record length]

---------        -------    -------------------

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

LOP_INSERT_ROWS  LCX_HEAP   1116

 

 

You can use fn_dblog() as a handy way to see if you are getting minimal logging or not.

 

Till now, we have discussed what does the minimal logging mean and how you can checks if you indeed are getting the minimal logging but we have not looked at the conditions that need to be met for minimal logging. The conditions for minimal logging are

·         The database recovery model must be set to BULK_LOGGED or SIMPLE. Under FULL recovery model, as the name implies, all operations are fully logged.

·         The target table should not have been marked for replication

·         If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged.

·         Minimal logging depends on the indexes defined on the table and whether the table/index is empty or not as described in the table below. You will notice that TABLOCK is required to enable minimal logging.

 

Table Schema

Empty

TABLOCK

Logging

Heap

Yes

Yes

minimal-logging

Heap

Yes

No

full logging

Heap

No

Yes

minimal-ogging

Heap

No

No

full logging

Heap + non-clustered index

Yes

Yes

minimal-logging

heap + non-clustered index

Yes

No

full logging

Heap + non-clustered index

No

Yes

data- minimal-logging

index-fulllogging

Heap + non-clustered index

No

No

full logging

clust Index

Yes

Yes

minimal-logging

clust Index

Yes

No

full logging

clust Index

No

Yes

full logging

clust Index

No

No

full logging

 

Published Tuesday, February 05, 2008 3:01 AM by Sunil Agarwal
Filed under:

Comments

 

Alex Rosa said:

Hi,

Interisting post.

I've been trying to reproduce it with the script below, but with no success when I change the recovery model.

USE AdventureWorks;

CREATE TABLE t_heap (c1 INT, c2 INT, c3 DECIMAL (38,2), c4 CHAR(1000));

-- HERE OK, I got the same result.

BULK INSERT t_heap

FROM 'C:\t_heap.dat' WITH (TABLOCK, FIELDTERMINATOR ='|')

------------------------------------------------------------------------------------------

-- HERE OK, I got the same result.

ALTER DATABASE AdventureWorks SET RECOVERY FULL

DECLARE @I INT

SELECT @I = 0

WHILE (@I < 5)

BEGIN

INSERT INTO T_HEAP WITH (TABLOCK) VALUES

     (@I, @I + 5000, 10.2, REPLICATE ('A', 60))

SET @I = @I + 1

END

SELECT TOP 5 operation,context,[log record length]

FROM fn_dblog(null, null)

WHERE allocunitname='dbo.t_heap'

ORDER BY [Log Record Length] DESC

Simple/Bulk recovery model I'm getting this result:

LOP_INSERT_ROWS LCX_HEAP 1116

------------------------------------------------------------------------------------------

ALTER DATABASE AdventureWorks SET RECOVERY SIMPLE

-- repeat INSERT and SELECT above

------------------------------------------------------------------------------------------

ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED

-- repeat INSERT and SELECT above

My setup:

Microsoft SQL Server 2005 - 9.00.3200.00 (Intel X86)

Oct  2 2007 11:33:27

Copyright (c) 1988-2005 Microsoft Corporation

Developer Edition on Windows NT 6.0 (Build 6000: )

If I got it right, the log record length result with SIMPLE/BULK recovery model should be 92 instead of 1116, right?

February 5, 2008 1:59 PM
 

Sunil Agarwal said:

You don't get minimal logging with regular TSQL INSERT statement independent of the recovery model. However, we are looking into if we can remove this restriction for regular inserts in future releases including SQL2008

February 5, 2008 2:21 PM
 

SELECT IsUseless(blabber) from sys.innarwebs CROSS APPLY jmass(blog) said:

The big news of the week is Windows Server 2008 RTM'ing. Read more here. The storage engine team has begin a series on bulk insert. It is definitely recommended reading. Part 1, Part 2, Part 3. The S ...

February 5, 2008 5:03 PM
 

Alex Rosa said:

Thanks for the answer.

I was looking for an article talking about the differences among FULL, SIMPLE and RECOVERY model (which each one of this setting really write in the log file), but I haven't find it.

Could you talk about this or blog about it?

February 7, 2008 5:13 PM
 

Sunil Agarwal said:

BOL has a good description on different recovery model. If you think something is not well covered, please send me a mail and we will update the BOL. In a nutshell, under SIMPLE recovery model, SQL Server truncates the inactive portion of the log but otherwise the logging is similar to what we have under bilk-logging recovery model.

February 7, 2008 5:56 PM
 

jl999 said:

If you have a table with clustered index, and with existing data, say you have 30 million in the table. When you bulk import 10 million more, each one would be fully logged, correct?

What's the best way to achieve minimal logging in this scenario, drop the clustered index first, then add the clustered index back after bulk import? For a large table, creating a clustered index might take a while. But is it still a better option than having a clustered index in place when importing?

February 21, 2008 7:36 PM
 

Sunil Agarwal said:

I recommend reading the following white paper that dealt with exactly these kinds of questions. This is still applicable for SQL2005. I will follow up this thread with some changes we have made for minimal logging in SQL Server 2008.

http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

February 21, 2008 7:46 PM
Anonymous comments are disabled

About Sunil Agarwal

Sunil Agarwal is a program manager in the SQL Server Storage Engine Group at Microsoft. He is responsible for concurrency, indexes, tempdb, LOBS, supportability, and bulk import/export. Prior to joining Microsoft, Sunil worked at DEC, Sybase, BMC Software and DigitalThink, focusing primarily on core database engine technologies and related applications.

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker