Quickly Import and Export Excel Data with LINQ to XML

Quickly Import and Export Excel Data with LINQ to XML

Rate This
  • Comments 92

One really handy thing to do with relational data is to import and export it to Excel. This can be a total snap using LINQ to XML in Visual Basic 2008. For example, let's say I want to import data from an Excel spreadsheet into a local database file like SQL Compact Edition. I also want to be able to export that data back into an Excel spreadsheet, basically allowing me to edit the data from either my application or Excel.

The trick really is getting the format of the Excel spreadsheet as XML and using that sheet as a template to populate your data. First open Excel (for this example I'll use the Excel 2003 format so you can use Excel 2007 or 2003 to follow along). Next let's enter some column headers, for my example I have a list of customers I want to import/export so the fields I'll want are Abbrev, Name, Phone, Country. It's also helpful to enter one line of dummy data (you'll see why in a second). So here' what we've got:

Now what I want to do is get all the customers stored in my little SQL CE local database into this spreadsheet.

(As a side note, you can easily add a SQL CE local database file to your project by selecting the "Local Database" new item template in Visual Studio 2008. You can easily access the data in SQL CE sdf files by adding a new datasource (Data --> Add New Datasource), adding a new connection, and change the data source to "Microsoft Compact SQL Server 3.5" then locate or create the .sdf file. Then you can create typed Datasets like normal.)

So I have a form where I just dragged my CustomersDataSet that I created from my local database file onto it as a DataGridView. This also sets up the BindingSource and BindingNavigator toolstrip like we'd expect. I added a couple buttons to the toolstrip one called "Import" and one called "Export".

Now in order to get the Excel spreadsheet into an XML format you can simply "Save As.." and select "XML Spreadsheet 2003 (.xml)" as the file type. Now navigate to that file and open it with notepad, select and copy all the text into your clipboard. Open up your Visual Basic program, type "Dim sheet =" and then paste the contents of your clipboard directly into the editor. It should look something like this (I collapsed some of the sections we're going to ignore):

Next we need to import the namespaces in the spreadsheet. This is necessary so that when we create and query our XML, Visual Basic will know the correct namespaces we're working with. So at the top of code file for our form we'll add:

Imports <xmlns="urn:schemas-microsoft-com:office:spreadsheet">

Imports <xmlns:o="urn:schemas-microsoft-com:office:office">

Imports <xmlns:x="urn:schemas-microsoft-com:office:excel">

Imports <xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
Imports <xmlns:html="http://www.w3.org/TR/REC-html40">

Now we can write a query that will take all of the customers in our SQL CE database file (technically the typed DataSet that was created for us) and populate the spreadsheet. You should see your dummy data in the XML above. Cut that out (ctrl-X) because we're going to replace that with all our customers using embedded expressions like so:

Dim customers = _

    From customer In Me.CustomersDataSet.Customers _

    Where customer.RowState <> DataRowState.Deleted _

        AndAlso Not customer.IsAbbrevNull _

    Order By customer.Abbrev _

    Select <Row>

               <Cell><Data ss:Type="String"><%= customer.Abbrev %></Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsNameNull, "", customer.Name)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsPhoneNull, "", customer.Phone)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsCountryNull, "",customer.Country)%> </Data></Cell>

           </Row>

I'm just selecting the data inside my typed Dataset which is populated in the load of the form, making sure not to grab any deleted rows or rows where the abbreviation is NULL. The customers variable above becomes a collection of XElements (IEnumerable(Of XElement)) and we simply pop that back into the sheet, replacing the dummy line we removed:

<Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount=<%= customers.Count + 1 %> x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">

<Row ss:StyleID="s62">

               <Cell><Data ss:Type="String">Abbrev</Data></Cell>

               <Cell><Data ss:Type="String">Name</Data></Cell>

               <Cell><Data ss:Type="String">Phone</Data></Cell>

               <Cell><Data ss:Type="String">Country</Data></Cell>

       </Row>

       <%= customers %>

</Table>

Also notice that I had to adjust the ss:ExpandedRowCount attribute as well to include the count of our customers plus one for the header row. Now we can write the results out to disk and start Excel:

sheet.Save("C:\Customers.xml")

Process.Start("Excel.exe", "C:\Customers.xml")

Cool!

Okay so now we want to be able to import this data as well. If any rows were added or edited we want that data back in our database. I'm going to use the Abbrev field as the unique identifier of our data. The query we need to write will group join (outer join) the current (non-deleted) rows in our CustomerDataSet to the rows here in the spreadsheet. That way if the resulting items in the collection contain a group (of one CustomerDataRow in this case), then we know that the CustomerDataRow needs to be updated, otherwise we know it's a new customer that needs to be added. We also need to skip the first row because that's our header row in the spreadsheet, and I'm doing an extra check for empty cells.

Dim sheet = XDocument.Load("C:\Customers.xml")

 

'Skip the first row because that is the header row in the spreadsheet.

'Notice also that we're eliminating deleted rows from the group join.

'We'll use the existence of the group collection to determine if the

' row needs to be added to the database.

Dim customers = From xmlCustomer In sheet...<Row> _

                   Let Abbrev = xmlCustomer.<Cell>(0) _

                   Let Name = xmlCustomer.<Cell>(1) _

                   Let Phone = xmlCustomer.<Cell>(2) _

                   Let Country = xmlCustomer.<Cell>(3) _

                Where _

                   Abbrev IsNot Nothing AndAlso _

                   Name IsNot Nothing AndAlso _

                   Phone IsNot Nothing AndAlso _

                   Country IsNot Nothing _

                Group Join customer In _

                    (From row In Me.CustomersDataSet.Customers _

                       Where row.RowState <> DataRowState.Deleted _

                             AndAlso Not row.IsAbbrevNull _

                       Select row) _

                    On customer.Abbrev Equals Abbrev _

                    Into Group _

                Select _

                    Abbrev = Abbrev.Value, _

                    Name = Name.Value, _

                    Phone = Phone.Value, _

                    Country = Country.Value, _

                    Group _

                Skip 1

 

 

'Add or update the customer row depending on if the

' group collection exists.

For Each customer In customers

    Dim r As CustomersDataSet.CustomersRow

    If customer.Group.Count = 0 Then

        'Add new row ands set the customer abbreviation.

        ' This field is used to determine uniqueness of the data

        r = Me.CustomersDataSet.Customers.NewCustomersRow()

        r.Abbrev = customer.Abbrev

    Else

        r = customer.Group(0)

    End If

 

    r.Name = customer.Name

    r.Phone = customer.Phone

    r.Country = customer.Country

 

    If r.RowState = DataRowState.Detached Then

        Me.CustomersDataSet.Customers.AddCustomersRow(r)

    End If

Next

Now we can make changes to the spreadsheet, import the data into our local database, or make changes to our local database and export those changes to Excel. I've attached the complete example built with Visual Basic 2008. You'll also need Excel 2003 or higher to view the spreadsheet. It's worth reiterating that you don't actually need Excel to create the spreadsheet in Visual Basic via code as long as you have the XML format, you just need it if you want to open it in Excel later. Remember we're not using automation, we're just creating XML in a specific format.

Enjoy!

Attachment: ImportExportExcel.zip
Leave a Comment
  • Please add 3 and 7 and type the answer here:
  • Post
  • thanks for the hint on save as office 2003 xml, I've been struggling with exporting to xml all kinds of other ways.  You are the best Beth!

  • I am trying to import data from a spreadsheet into and SQL database.  I have been trying to use your example as a framework for my situation.  The problem I am experiencing is when a cell in the spreadsheet for a given field is blank, the data in the columns next to it are all shifted over to the left and end up basically in the wrong columns.

    the LINQ statement is as follows:

               Dim Player = From xmlPlayer In sheet...<Row> _

                                          Let Col0 = xmlPlayer.<cell>(0) _

                                          Let Col1 = xmlPlayer.<cell>(1) _

                                          Let Col2 = xmlPlayer.<cell>(2) _

                                          Let Col3 = xmlPlayer.<cell>(3) _

                                          Let Col4 = xmlPlayer.<Cell>(4) _

                                          Let Col5 = xmlPlayer.<Cell>(5) _

                                          Let Col6 = xmlPlayer.<Cell>(6) _

                                          Let Col7 = xmlPlayer.<Cell>(7) _

                                          Let Col8 = xmlPlayer.<Cell>(8) _

                                          Let Col9 = xmlPlayer.<Cell>(9) _

                                          Let Col10 = xmlPlayer.<Cell>(10) _

                                          Let Col11 = xmlPlayer.<Cell>(11) _

                                          Let Col12 = xmlPlayer.<Cell>(12) _

                                        Where _

                                          Col0 IsNot Nothing _

                                          Group xmlPlayer By Col1, Col0, Col2, Col3, Col4, Col5, Col6, Col7, _

                                                             Col8, Col9, Col10, Col11, Col12 Into Group _

                                       Select _

                                           Col0 = If(Col0 Is Nothing, "", Col0.Value), _

                                           Col1 = If(Col1 Is Nothing, "", Col1.Value), _

                                           Col2 = If(Col2 Is Nothing, "", Col2.Value), _

                                           Col3 = If(Col3 Is Nothing, "", Col3.Value), _

                                           Col4 = If(Col4 Is Nothing, "", Col4.Value), _

                                           Col5 = If(Col5 Is Nothing, "", Col5.Value), _

                                           Col6 = If(Col6 Is Nothing, "", Col6.Value), _

                                           col7 = If(Col7 Is Nothing, "", Col7.Value), _

                                           col8 = If(Col8 Is Nothing, "", Col8.Value), _

                                           col9 = If(Col9 Is Nothing, "", Col9.Value), _

                                           col10 = If(Col10 Is Nothing, "", Col10.Value), _

                                           col11 = If(Col11 Is Nothing, "", Col11.Value), _

                                           col12 = If(Col12 Is Nothing, "", Col12.Value), _

                                           Group _

                                       Skip 0

    And a sample of the XML from the Excel Spreadsheet:

    <ss:Worksheet ss:Name="Sheet1">

       <ss:Table ss:ExpandedColumnCount="5" ss:ExpandedRowCount="24" x:FullColumns="1" x:FullRows="1">

         <ss:Row>

           <ss:Cell>

             <ss:Data ss:Type="String">First Name</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Last Name</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Account Number</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Member Type</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Gender</ss:Data>

           </ss:Cell>

         </ss:Row>

         <ss:Row>

           <ss:Cell>

             <ss:Data ss:Type="String">Bill</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Jones</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="Number">1222</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Principle</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">M</ss:Data>

           </ss:Cell>

         </ss:Row>

         <ss:Row>

           <ss:Cell>

             <ss:Data ss:Type="String">Ron</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">Smith</ss:Data>

           </ss:Cell>

           <ss:Cell ss:Index="4">

             <ss:Data ss:Type="String">Guest</ss:Data>

           </ss:Cell>

           <ss:Cell>

             <ss:Data ss:Type="String">M</ss:Data>

           </ss:Cell>

         </ss:Row>

    Where the blank cell is located the XML returns:

    <ss:Cell ss:Index="4">

    For some reason it is ignored and a blank or "" is not inserted into my datatable.  Hopefully you can shed some light on how to resolve this.

    Thanks!!

    Ted

  • Great 'how to'.

    I'm trying to modify it to work with my app which uses an Access database. Its a multiuser app and the data is on a different machine to the app. Connection strings are set up accordingly and the app works fine in all other respects.

    My dataset contains the data that I want to export - it appears on the form as a child grid as described in one of your earlier posts/videos. It compiles OK and the app runs OK but when I hit the Export button I only get the header row appearing in the spreadsheet.

    Does your example code requery the dataset or is it executing a new query against the .sdf? If so I suspect that this is where I'm having problems using a remote .mdb file.

    I am obviously missing something....but what?

    Hope you can help - keep up the good work!

  • I've encountered an issue while trying to implement this approach (by the way, great article! - it prompted me to switch to VB for the Excel handling, while most of our code is C#!):  if the excel sheet contains formulas in some of the cells, their values are not accessible at the time the rows are enumerated.  Any ideas?  Thanks Beth!

  • Great Stuff...

    Took me a little tweaking to get it to work with my application.  My question is how do I go about converting the r.mydate = whatever.mydate string so it can be inserted into a date container...I keep getting a wrong format error telling me I need to parse the string before inserting it  any help would be great

    Thanks

  • Hi Trenton,

    You should be able to use the VB date functions to check the string and then convert it. Something like

    If IsDate(mydate) Then

       r.MyDate = CDate(mydate)

    End If

    HTH,

    -B

  • Excellent stuff Beth, especially for a starter on LINQ to SQL. Took me some tweaking to get it going (especially the Dim sheet =... statement took me some time to reproduce myself. Saving the Excel file as xml wasn't the problem; pasting it into my own project took a bit of debugging :-( ).

    Anyway, I ran this stuff against Excel 2007, but I saved the Excel file as 2003 XML file. Will you ever make a version for Excel 2007? But thanx anyway!

  • there are problems if you have more than one excel sheet in the file, you should include a filter in the where condition.

    Something like

    AndAlso  sheet...<Worksheet>.@ss:Name = "Sheet1"

    But the above statement does not works if the sheet1 is the second sheet of the file. since the above is the same as

    AndAlso sheet...<Worksheet>(0).@ss:Name = "Sheet1"

    I'm still trying to figure out how to tell the linq statement to pick only the sheet I want

  • Hi winsrp,

    As long as you're importing the XML namespaces above you can change the import code to this and it will select only Sheet1

    Dim book = XDocument.Load(Me.FileName)

    Dim sheet = From s In book...<Worksheet> Where s.@ss:Name = "Sheet1"

    HTH,

    -B

  • This encapsulates nicely a solution I've been searching for - how to pump data between Excel / SQL server without automation or having to install Excel on the server. It has also opened my eyes to the possibilities of LINQ with XML. Worth a thousand pages of lesser material- thanks Beth

  • I have a excel file which has multiple sheets and many rows with number of columns.

    I have written the codes below. How can I read cells from each Row. The code I posted probably results the first row of each sheet.

    Dim sheet = From s In book...<Worksheet> Where (s.@ss:Name = "employee list") _

                       Let OrderId = s...<Row>.<Cell>(0).Value _

                       Let Age = s...<Row>.<Cell>(4).Value _

                       Let sex = s...<Row>.<Cell>(5).Value _

                       Let SEmpDate = s...<Row>.<Cell>(6).Value _

                       Let EEmpDate = s...<Row>.<Cell>(7).Value _

                       Let avgPartTime = s...<Row>.<Cell>(9).Value _

                       Let isClause = s...<Row>.<Cell>(10).Value _

                       Let shutDownMethodReserve = s...<Row>.<Cell>(15).Value _

                       Let policyInPaying = s...<Row>.<Cell>(16).Value _

                       Let currReserveBefore2003 = s...<Row>.<Cell>(17).Value _

                       Let currReserveAfter2003 = s...<Row>.<Cell>(18).Value _

                       Let currReserveBefore2004 = s...<Row>.<Cell>(19).Value _

                       Let currReserveAfter2004 = s...<Row>.<Cell>(20).Value _

                       Let currReserveProvFund = s...<Row>.<Cell>(21).Value

  • I modified the statements as: But getting an error as object reference not set to an instance of an object.

    Only getting the header items.

    Dim sheet = From s In book...<Worksheet> Where (s.@ss:Name = "employee list") _

                       Let rows = s...<Row> _

                       Select rows

           Dim crows = <Worksheet>

                           <%= sheet %>

                       </Worksheet>

           Dim sheet1 = From val In crows...<Row> _

                           Let OrderId = val.<Cell>(0) _

                           Let Age = val.<Cell>(4) _

                           Let sex = val.<Cell>(5) _

                           Let SEmpDate = val.<Cell>(6) _

                           Let EEmpDate = val.<Cell>(7) _

                           Let avgPartTime = val.<Cell>(9) _

                           Let isClause = val.<Cell>(10) _

                           Let shutDownMethodReserve = val.<Cell>(15) _

                           Let policyInPaying = val.<Cell>(16) _

                           Let currReserveBefore2003 = val.<Cell>(17) _

                           Let currReserveAfter2003 = val.<Cell>(18) _

                           Let currReserveBefore2004 = val.<Cell>(19) _

                           Let currReserveAfter2004 = val.<Cell>(20) _

                           Let currReserveProvFund = val.<Cell>(21) _

                           Select _

                               OrderId = OrderId.Value, _

                               Age = Age.Value, _

                               Sex = sex.Value, _

                               SEmpDate = SEmpDate.Value, _

                               EEmpDate = EEmpDate.Value, _

                               avgPartTime = avgPartTime.Value, _

                               isClause = isClause.Value, _

                               shutDownMethodReserve = shutDownMethodReserve.Value, _

                               policyInPaying = policyInPaying.Value, _

                               currReserveBefore2003 = currReserveBefore2003.Value, _

                               currReserveAfter2003 = currReserveAfter2003.Value, _

                               currReserveBefore2004 = currReserveBefore2004.Value, _

                               currReserveAfter2004 = currReserveAfter2004.Value, _

                               currReserveProvFund = currReserveProvFund.Value

  • Hi, I have posted all the details at:

    http://forums.asp.net/p/1535425/3730573.aspx#3730573

    I am trying but not getting the result. Is there any fault in the query?

  • Hi Beth, many thanks for sharing your valuable knowleadge. Being a newbie in vb.net, i was able to create something decent for my marine company with your sample in exporting and importing to xml from datatables.

    I have two different dataset namely open & close.

    open dataset have three datatable which contains the data i need to pull into close dataset datatable. Would greatly appreciate if you can kindly guide me on which area should i read up on and look into to get the above task done.

    Mny Tx

  • Hi Beth,

    greatly enjoyed your post..it was well put together :)

    I have a question that has got me so flustered I cannot think straight!

    IF I have already started a New Project..Windows Form in Visual Studio 2008, is there A way I can import or lol...get..an excel 2003 workbook..or just a worksheet into the windows form project?

    I am so totally new at this but I really need to get this done and in a hurry!

    Thanks,

    Jenn

Page 5 of 7 (92 items) «34567