SQL Server Storage Engine

Bulk Importing data with OPENROWSET in SQL2005

Before SQL2005, you could use Bulk Insert or BCP to import data into SQL Server. While both of these mechanisms have been popular and used widely, they don't provide any processing of input data before bulk importing into the target table. If you need to preprocess the data,  you can use DTS (aka SSIS) or, now in SQL2005, you may be able to use OPENROWSET with BULK rowset provides to accompolish the same, but for simpler tasks using SELECT statement functionality.

Recently, I was asked how to Bulk Import data after removing whitespaces into a varchar column in the target table. The customer desribed the problem as "We're loading data from a text file with fixed column widths into a table in SQL Server 2005.All table fields are varchars of the same length as the source fields. The problem is that the BULK INSERT process doesn't truncate whitespace, and since the source fields can contain variable-length values or nothing at all we wind up with fields ending with or containing nothing but spaces. Our best plan at the moment is to RTRIM() every column and insert the data into another table, but since we'll be loading millions of rows every day we were looking for any more efficient methods"

You can easily accompolish this task using OPENROWSET. Here is one example

create table t_cust (c1 varchar(20), c2 varchar(20))

go

 

 

You can use the following command to remove the whitespaces and importing the data into the target table simultaneously. This Bulk Import follows the bulk optimizations rules (i.e. bulk logging) similar to the ones outlined for Bulk Insert in BOL.

 

INSERT into t_cust with (TABLOCK)

SELECT RTRIM(c1), RTRIM(C2)

from OPENROWSET (BULK <data-file>,<format-file> as t1

 

You will note that OPENROWSET reads the data from the datafile and then you can remove the trailing blanks by calling the function RTRIM for each value.

 

An example format file is

 

  <?xml version="1.0" ?>

<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<RECORD>

  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS" />

  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS" />

  </RECORD>

<ROW>

  <COLUMN SOURCE="1" NAME="c1" xsi:type="SQLVARYCHAR" />

  <COLUMN SOURCE="2" NAME="c2" xsi:type="SQLVARYCHAR" />

  </ROW>

  </BCPFORMAT>

 

Published Tuesday, May 30, 2006 9:39 PM by Sunil Agarwal
Filed under:

Comments

 

SimonS' SQL Server Stuff said:

Thanks to Euan for pointing out that Sunil from the Storage Engine team has started a blog. This continues...
May 31, 2006 5:28 PM
 

Rolando Ramirez's WebLog said:

ADO.NET 2.0 &amp;amp; SqlBulkCopy : Link1 - Link2
Bulk Importing data with OPENROWSET in SQL2005
May 31, 2006 7:04 PM
 

xie said:

Hi, Sunil,

If you do INSERT INTO ... FROM OPENROWSET (...)
Is this a two stage process or just one? I mean, internally, will SQL server finish the OPENROWSET() step into TempDB (I guess), then start the INSERT INTO? Or this is a streamlined process?

My other question related to BULK INSERT, is how to do BULK INSERT into a table with UDTs? It seems that BULK INSERT won't invoke UDT's Parse() method at all. So what's the best practice one should do?

Thanks!

Dong Xie
June 9, 2006 7:14 AM
 

Sunil Agarwal said:

Dong: thanks for your interest.

Insert into <target> ... OPENROWSET (BULK..) is NOT a two stage process. There is no temp table involved.

UDT can be loaded as varbinary data
June 9, 2006 2:34 PM
 

xie said:

Dear Sunil,

Thanks for the quick response.

When you say 'load as varbinary data', do you mean something like this '4FCC4392DF'?

If I dump a table with UDTs by using BCP OUT in CHAR format, I can see these kind of Char representation of binary data. I can BULK INSERT this file without problem.

My question is: if for example, you have a UDT named Points in a table, you have a text file like this:

Column1 Column2 Column3
abc         abc         12, 23
def         def          34, 32
ghi         ghi          21, 44

Column 3 is the string format of Points, then how to BULK INSERT this text file?

Regards,

dong
June 10, 2006 9:48 AM
 

Sunil Agarwal said:

The way to this will be

INSERT into <target-table> with (TABLOCK)

SELECT c1, c2, Convert-to-udt (c3, c4)

from OPENROWSET (BULK <data-file>,<format-file> as t1

you will need to (x,y) of the data as two separate columns and then convert it to UDT using the conversion (assuming that one is available).

Varbinary comment has more to do with if you export and then import UDT, then you can use VARBINARY. No conversion will be needed.

thanks
Sunil
June 10, 2006 5:12 PM
 

xie said:

Dear Sunil,

Thanks for the quick answer again. Finally it solved my question which I have been asking for a long time.

In fact I realized this solution when I first saw your Blog article. Thanks!

My further questions:

1. Except this solution, any other way to do this? I mean, when you guys designed UDT, were you thinking to use OPENROWSET for the BULK loading?

2. If this is the only way, I would suggest to add this item into BOL.

3. About UDAs, any plan to support multiple columns as input parameter? How long do we need to wait that feature? In a sp or a new SKU?

All in all, CLR in SQL is great and fun. Keep going!

dong
June 12, 2006 7:30 AM
 

Sunil Agarwal said:

Dong:

Herre are some of the answers
(1) We were thinking varbinary for UDT bulk load. Not really with the exact scenario you had.
(2) This is the only way that I can think of. Yes, we will add some of these scenarios in BOL. Thanks for your suggestion.
(3) Don't know. Can you please post this on the SQL Server public newsgrooup.
June 13, 2006 2:24 PM
 

eph_tagh said:

Sunil,

I have a problem with BULK INSERT.  I have a text field that I can import data via bcp without any problems. However when I try to perform a BULK INSERT on the data it is being truncated at 8K.  Is there any way to get around this limitation in BULK INSERT?

Thank you,

EPH
August 29, 2006 10:17 AM
 

Sunil Agarwal said:

I tried the following
create table t_text (c1 text)

bulk insert bulktest..t_text from
'c:\temp\t_text-c.dat'

My data file had 16000 characters. It wroks for me. can you please send me the details (schema, datafile and the actual command used) in a separate email sunila@microsoft.com
September 6, 2006 7:27 PM
 

Dong Xie's Blog said:

The task is to bulk load data from flat files into tables with UDT column. This blog post will show you

November 27, 2006 11:43 AM
Anonymous comments are disabled

About Sunil Agarwal

Sunil Agarwal is a program manager in the SQL Server Storage Engine Group at Microsoft. He is responsible for concurrency, indexes, tempdb, LOBS, supportability, and bulk import/export. Prior to joining Microsoft, Sunil worked at DEC, Sybase, BMC Software and DigitalThink, focusing primarily on core database engine technologies and related applications.

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker