Bulk Importing data with OPENROWSET in SQL2005

Bulk Importing data with OPENROWSET in SQL2005

Rate This
  • Comments 11

Before SQL2005, you could use Bulk Insert or BCP to import data into SQL Server. While both of these mechanisms have been popular and used widely, they don't provide any processing of input data before bulk importing into the target table. If you need to preprocess the data,  you can use DTS (aka SSIS) or, now in SQL2005, you may be able to use OPENROWSET with BULK rowset provides to accompolish the same, but for simpler tasks using SELECT statement functionality.

Recently, I was asked how to Bulk Import data after removing whitespaces into a varchar column in the target table. The customer desribed the problem as "We're loading data from a text file with fixed column widths into a table in SQL Server 2005.All table fields are varchars of the same length as the source fields. The problem is that the BULK INSERT process doesn't truncate whitespace, and since the source fields can contain variable-length values or nothing at all we wind up with fields ending with or containing nothing but spaces. Our best plan at the moment is to RTRIM() every column and insert the data into another table, but since we'll be loading millions of rows every day we were looking for any more efficient methods"

You can easily accompolish this task using OPENROWSET. Here is one example

create table t_cust (c1 varchar(20), c2 varchar(20))

go

 

 

You can use the following command to remove the whitespaces and importing the data into the target table simultaneously. This Bulk Import follows the bulk optimizations rules (i.e. bulk logging) similar to the ones outlined for Bulk Insert in BOL.

 

INSERT into t_cust with (TABLOCK)

SELECT RTRIM(c1), RTRIM(C2)

from OPENROWSET (BULK <data-file>,<format-file> as t1

 

You will note that OPENROWSET reads the data from the datafile and then you can remove the trailing blanks by calling the function RTRIM for each value.

 

An example format file is

 

  <?xml version="1.0" ?>

<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<RECORD>

  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS" />

  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS" />

  </RECORD>

<ROW>

  <COLUMN SOURCE="1" NAME="c1" xsi:type="SQLVARYCHAR" />

  <COLUMN SOURCE="2" NAME="c2" xsi:type="SQLVARYCHAR" />

  </ROW>

  </BCPFORMAT>

 

Leave a Comment
  • Please add 2 and 3 and type the answer here:
  • Post
  • Thanks to Euan for pointing out that Sunil from the Storage Engine team has started a blog. This continues...
  • ADO.NET 2.0 &amp;amp; SqlBulkCopy : Link1 - Link2
    Bulk Importing data with OPENROWSET in SQL2005
  • Hi, Sunil,

    If you do INSERT INTO ... FROM OPENROWSET (...)
    Is this a two stage process or just one? I mean, internally, will SQL server finish the OPENROWSET() step into TempDB (I guess), then start the INSERT INTO? Or this is a streamlined process?

    My other question related to BULK INSERT, is how to do BULK INSERT into a table with UDTs? It seems that BULK INSERT won't invoke UDT's Parse() method at all. So what's the best practice one should do?

    Thanks!

    Dong Xie
  • Dong: thanks for your interest.

    Insert into <target> ... OPENROWSET (BULK..) is NOT a two stage process. There is no temp table involved.

    UDT can be loaded as varbinary data
  • Dear Sunil,

    Thanks for the quick response.

    When you say 'load as varbinary data', do you mean something like this '4FCC4392DF'?

    If I dump a table with UDTs by using BCP OUT in CHAR format, I can see these kind of Char representation of binary data. I can BULK INSERT this file without problem.

    My question is: if for example, you have a UDT named Points in a table, you have a text file like this:

    Column1 Column2 Column3
    abc         abc         12, 23
    def         def          34, 32
    ghi         ghi          21, 44

    Column 3 is the string format of Points, then how to BULK INSERT this text file?

    Regards,

    dong
  • The way to this will be

    INSERT into <target-table> with (TABLOCK)

    SELECT c1, c2, Convert-to-udt (c3, c4)

    from OPENROWSET (BULK <data-file>,<format-file> as t1

    you will need to (x,y) of the data as two separate columns and then convert it to UDT using the conversion (assuming that one is available).

    Varbinary comment has more to do with if you export and then import UDT, then you can use VARBINARY. No conversion will be needed.

    thanks
    Sunil
  • Dear Sunil,

    Thanks for the quick answer again. Finally it solved my question which I have been asking for a long time.

    In fact I realized this solution when I first saw your Blog article. Thanks!

    My further questions:

    1. Except this solution, any other way to do this? I mean, when you guys designed UDT, were you thinking to use OPENROWSET for the BULK loading?

    2. If this is the only way, I would suggest to add this item into BOL.

    3. About UDAs, any plan to support multiple columns as input parameter? How long do we need to wait that feature? In a sp or a new SKU?

    All in all, CLR in SQL is great and fun. Keep going!

    dong
  • Dong:

    Herre are some of the answers
    (1) We were thinking varbinary for UDT bulk load. Not really with the exact scenario you had.
    (2) This is the only way that I can think of. Yes, we will add some of these scenarios in BOL. Thanks for your suggestion.
    (3) Don't know. Can you please post this on the SQL Server public newsgrooup.
  • Sunil,

    I have a problem with BULK INSERT.  I have a text field that I can import data via bcp without any problems. However when I try to perform a BULK INSERT on the data it is being truncated at 8K.  Is there any way to get around this limitation in BULK INSERT?

    Thank you,

    EPH
  • I tried the following
    create table t_text (c1 text)

    bulk insert bulktest..t_text from
    'c:\temp\t_text-c.dat'

    My data file had 16000 characters. It wroks for me. can you please send me the details (schema, datafile and the actual command used) in a separate email sunila@microsoft.com
  • The task is to bulk load data from flat files into tables with UDT column. This blog post will show you

Page 1 of 1 (11 items)