Quickly Import and Export Excel Data with LINQ to XML

Quickly Import and Export Excel Data with LINQ to XML

Rate This
  • Comments 94

One really handy thing to do with relational data is to import and export it to Excel. This can be a total snap using LINQ to XML in Visual Basic 2008. For example, let's say I want to import data from an Excel spreadsheet into a local database file like SQL Compact Edition. I also want to be able to export that data back into an Excel spreadsheet, basically allowing me to edit the data from either my application or Excel.

The trick really is getting the format of the Excel spreadsheet as XML and using that sheet as a template to populate your data. First open Excel (for this example I'll use the Excel 2003 format so you can use Excel 2007 or 2003 to follow along). Next let's enter some column headers, for my example I have a list of customers I want to import/export so the fields I'll want are Abbrev, Name, Phone, Country. It's also helpful to enter one line of dummy data (you'll see why in a second). So here' what we've got:

Now what I want to do is get all the customers stored in my little SQL CE local database into this spreadsheet.

(As a side note, you can easily add a SQL CE local database file to your project by selecting the "Local Database" new item template in Visual Studio 2008. You can easily access the data in SQL CE sdf files by adding a new datasource (Data --> Add New Datasource), adding a new connection, and change the data source to "Microsoft Compact SQL Server 3.5" then locate or create the .sdf file. Then you can create typed Datasets like normal.)

So I have a form where I just dragged my CustomersDataSet that I created from my local database file onto it as a DataGridView. This also sets up the BindingSource and BindingNavigator toolstrip like we'd expect. I added a couple buttons to the toolstrip one called "Import" and one called "Export".

Now in order to get the Excel spreadsheet into an XML format you can simply "Save As.." and select "XML Spreadsheet 2003 (.xml)" as the file type. Now navigate to that file and open it with notepad, select and copy all the text into your clipboard. Open up your Visual Basic program, type "Dim sheet =" and then paste the contents of your clipboard directly into the editor. It should look something like this (I collapsed some of the sections we're going to ignore):

Next we need to import the namespaces in the spreadsheet. This is necessary so that when we create and query our XML, Visual Basic will know the correct namespaces we're working with. So at the top of code file for our form we'll add:

Imports <xmlns="urn:schemas-microsoft-com:office:spreadsheet">

Imports <xmlns:o="urn:schemas-microsoft-com:office:office">

Imports <xmlns:x="urn:schemas-microsoft-com:office:excel">

Imports <xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
Imports <xmlns:html="http://www.w3.org/TR/REC-html40">

Now we can write a query that will take all of the customers in our SQL CE database file (technically the typed DataSet that was created for us) and populate the spreadsheet. You should see your dummy data in the XML above. Cut that out (ctrl-X) because we're going to replace that with all our customers using embedded expressions like so:

Dim customers = _

    From customer In Me.CustomersDataSet.Customers _

    Where customer.RowState <> DataRowState.Deleted _

        AndAlso Not customer.IsAbbrevNull _

    Order By customer.Abbrev _

    Select <Row>

               <Cell><Data ss:Type="String"><%= customer.Abbrev %></Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsNameNull, "", customer.Name)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsPhoneNull, "", customer.Phone)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsCountryNull, "",customer.Country)%> </Data></Cell>

           </Row>

I'm just selecting the data inside my typed Dataset which is populated in the load of the form, making sure not to grab any deleted rows or rows where the abbreviation is NULL. The customers variable above becomes a collection of XElements (IEnumerable(Of XElement)) and we simply pop that back into the sheet, replacing the dummy line we removed:

<Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount=<%= customers.Count + 1 %> x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">

<Row ss:StyleID="s62">

               <Cell><Data ss:Type="String">Abbrev</Data></Cell>

               <Cell><Data ss:Type="String">Name</Data></Cell>

               <Cell><Data ss:Type="String">Phone</Data></Cell>

               <Cell><Data ss:Type="String">Country</Data></Cell>

       </Row>

       <%= customers %>

</Table>

Also notice that I had to adjust the ss:ExpandedRowCount attribute as well to include the count of our customers plus one for the header row. Now we can write the results out to disk and start Excel:

sheet.Save("C:\Customers.xml")

Process.Start("Excel.exe", "C:\Customers.xml")

Cool!

Okay so now we want to be able to import this data as well. If any rows were added or edited we want that data back in our database. I'm going to use the Abbrev field as the unique identifier of our data. The query we need to write will group join (outer join) the current (non-deleted) rows in our CustomerDataSet to the rows here in the spreadsheet. That way if the resulting items in the collection contain a group (of one CustomerDataRow in this case), then we know that the CustomerDataRow needs to be updated, otherwise we know it's a new customer that needs to be added. We also need to skip the first row because that's our header row in the spreadsheet, and I'm doing an extra check for empty cells.

Dim sheet = XDocument.Load("C:\Customers.xml")

 

'Skip the first row because that is the header row in the spreadsheet.

'Notice also that we're eliminating deleted rows from the group join.

'We'll use the existence of the group collection to determine if the

' row needs to be added to the database.

Dim customers = From xmlCustomer In sheet...<Row> _

                   Let Abbrev = xmlCustomer.<Cell>(0) _

                   Let Name = xmlCustomer.<Cell>(1) _

                   Let Phone = xmlCustomer.<Cell>(2) _

                   Let Country = xmlCustomer.<Cell>(3) _

                Where _

                   Abbrev IsNot Nothing AndAlso _

                   Name IsNot Nothing AndAlso _

                   Phone IsNot Nothing AndAlso _

                   Country IsNot Nothing _

                Group Join customer In _

                    (From row In Me.CustomersDataSet.Customers _

                       Where row.RowState <> DataRowState.Deleted _

                             AndAlso Not row.IsAbbrevNull _

                       Select row) _

                    On customer.Abbrev Equals Abbrev _

                    Into Group _

                Select _

                    Abbrev = Abbrev.Value, _

                    Name = Name.Value, _

                    Phone = Phone.Value, _

                    Country = Country.Value, _

                    Group _

                Skip 1

 

 

'Add or update the customer row depending on if the

' group collection exists.

For Each customer In customers

    Dim r As CustomersDataSet.CustomersRow

    If customer.Group.Count = 0 Then

        'Add new row ands set the customer abbreviation.

        ' This field is used to determine uniqueness of the data

        r = Me.CustomersDataSet.Customers.NewCustomersRow()

        r.Abbrev = customer.Abbrev

    Else

        r = customer.Group(0)

    End If

 

    r.Name = customer.Name

    r.Phone = customer.Phone

    r.Country = customer.Country

 

    If r.RowState = DataRowState.Detached Then

        Me.CustomersDataSet.Customers.AddCustomersRow(r)

    End If

Next

Now we can make changes to the spreadsheet, import the data into our local database, or make changes to our local database and export those changes to Excel. I've attached the complete example built with Visual Basic 2008. You'll also need Excel 2003 or higher to view the spreadsheet. It's worth reiterating that you don't actually need Excel to create the spreadsheet in Visual Basic via code as long as you have the XML format, you just need it if you want to open it in Excel later. Remember we're not using automation, we're just creating XML in a specific format.

Enjoy!

Attachment: ImportExportExcel.zip
Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post
  • Hi Raymond,

    If you need to select data from a set of data tables and insert them into another you can take a look at LINQ to DataSet to greate a join between the three tables:

    http://msdn.microsoft.com/en-us/vbasic/bb738025.aspx

    You can also use the dataset functions themselves to select data out, take a look at the ADO.NET documentation in the MSDN library for that.

    HTH,

    -B

  • Hi Jenn,

    I'm not quite sure what you're asking. This example creates a spreadsheet using pure XML so it sould be in almost any type of application including Windows Forms.

    -B

  • Thanks Beth for your kind reply.

  • HELP!! AARRGGHH!! I'm new to VB 2008 or whatever its called. I need to import from an Excel spredsheet that has about twenty tabs and LOADS of info on each tab and I need to import all of it into a database where they will input each record one at a time after that. but the import is killing me! what do i do? :(

  • The thing is, datasets and adapters and that stuff is too complicated to work with. the xml is a good thing but the datasets aren't. and ... ... i just want to import. that's all. after that it'll be pretty easy.

  • Hi Beth,

    Good stuff.

    If using your given example and there duplicated abbrev in xml files.

    How do i import to dataset make abbrev unique?

    Tx Alot

    Cheers

  • Hi Beth,

    Dim sheet = <?xml version="1.0"?>

    This is a phrase in the Private Sub btnExport_Click.

    When I copied this in my source it gives me the remark: 'Variable declaration without an As clause....'

    How can I make this being accepted without creating the declaration?

    By the way, your solution is very helpfull.

    Ivo.

  • Hi Ivo,

    You need to make sure Option Strict is Off and Option Infer is ON. See msdn.microsoft.com/.../bb384665.aspx

    HTH,

    -B

  • Hi Beth,

    Thanks for your reply. i have my solution running.

    Carl Kelly did a nice job. Now i have a guestion about If statements in XML.

    My columns have different widths. Now, while looping through de columns I want to set the widths in XML language.

    Here you have my source for now:

    <%= From t In source.Tables _

       Let excelColumnDataType = GetColumnDataTypes(t.columns) _

       Select <Worksheet ss:Name=<%= t.TableName %>>

                  <Table ss:ExpandedColumnCount=<%= t.Columns.Count %> ss:ExpandedRowCount=<% = t.Rows.Count + 1 %> x:FullColumns="1" x:FullRows="1">

                                              <%= From c In Enumerable.Range(0, t.Columns.Count) _

                                                  Select <Column ss:Index=<%= (c + 1) %> ss:AutoFitWidth="1">

                                                             <ss:if test=<%= c = 1 %>>>

                                                                 <column ss:Index=<%= c %> ss:AutoFitWidth="1" ss:Width="120"/>

                                                             </ss:if>

                                                         </Column> %>

                                              <Row>

                                                  <%= From c In Enumerable.Range(0, t.Columns.Count) _

                                                      Select <Cell ss:StyleID="s62"><Data ss:Type="String"><%= t.Columns(c).Caption %></Data></Cell> %>

                                              </Row>

                                          </Table>

                                      </Worksheet> %>

    Maybe, you have a solution for the if statement.

    Thanks in advance,

    Ivo.

  • Great job Beth.

    But I'm having this same problem:

    "The problem I am experiencing is when a cell in the spreadsheet for a given field is blank, the data in the columns next to it are all shifted over to the left and end up basically in the wrong columns."

    Could you suggest a solution ? Thanks, Marzio

  • The exporting part worked excellently but i have been trying to import the data but I couldn't

  • This example is good enough for simple exporting to Excel if you aren't trying to generate reports. It is fraught with a host of problems especially related to formatting and indexing of rows if you attempt reports. Better stick with your ExcelWriter or SSRS if you are trying that.

  • @Swami - correct this is meant for simple export/import of data. If you need to format it more like a report you can also take a look at Open XML.

  • How can i get rid of the namespaces on each tag?

    <Row xmlns="urn:schemas-microsoft-com:office:spreadsheet">

           <Cell ss:StyleID="DefaultCellStyle" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns="urn:schemas-microsoft-com:office:spreadsheet">

             <Data ss:Type="String"></Data>

           </Cell>

    </Row>

    i followed ur example, changed a few to functions that generate XElements (like getWorksheetXML, etc..)

    But when i export it, it puts all these namespaces on each tag..

    Heres what my Code looks like

    Dim xmlExport = <?xml version="1.0"?>

                           <?mso-application progid="Excel.Sheet"?>

                           <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"

                               xmlns:o="urn:schemas-microsoft-com:office:office"

                               xmlns:x="urn:schemas-microsoft-com:office:excel"

                               xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"

                               xmlns:html="www.w3.org/.../REC-html40">

                               <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

                                   <Author>Nard</Author>

                                   <LastAuthor>Nard</LastAuthor>

                                   <Created>2011-09-13T00:53:42Z</Created>

                                   <Version>14.00</Version>

                               </DocumentProperties>

                               <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">

                                   <AllowPNG/>

                               </OfficeDocumentSettings>

                               <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">

                                   <WindowHeight>7995</WindowHeight>

                                   <WindowWidth>17235</WindowWidth>

                                   <WindowTopX>120</WindowTopX>

                                   <WindowTopY>75</WindowTopY>

                                   <ProtectStructure>False</ProtectStructure>

                                   <ProtectWindows>False</ProtectWindows>

                               </ExcelWorkbook>

                               <%= getStyles() %>

                               <%= Me.ActiveWorksheet.ToXML %>

                           </Workbook>

    ... where getStyles() produces the Styles Tags and Me.ActiveWorkSheet.ToXML produces the Worksheet xml Tag. How can i get rid of the namespaces on each tag? Thanks

  • Hi Beth - followed your video to export to Excel, but am trying to adapt to using an existing XML file.  The XML is structured like this - note the element DIMS:

    <TRUCKS>

     <TRUCK Counter="0">

       <CARRIER>A. BLAIR ENTERPRISES</CARRIER>

       <CARRIERPHONE>(502)326-0500</CARRIERPHONE>

       <UNITNO>421</UNITNO>

       <UNITTYPE>SMALL STRAIGHT                </UNITTYPE>

       <DISTANCE>0</DISTANCE>

       <AVAILABLE>06/26/2012 17:00</AVAILABLE>

       <STATUS>On A Load</STATUS>

       <LOCATION>JACKSONVILLE, FL</LOCATION>

       <NOTE> TEAM REAL TIME GPS TRACKED</NOTE>

       <PAYLOAD>3300</PAYLOAD>

       <DIMS>

         <BOXLENGTH>0</BOXLENGTH>

         <BOXWIDTH>0</BOXWIDTH>

         <BOXHEIGHT>0</BOXHEIGHT>

       </DIMS>

       <DOMICILE>US</DOMICILE>

       <SATELLITE>N</SATELLITE>

     </TRUCK>

      ...

      ...

      ...

    <TRUCKS>

    My code looks like this:

    Imports Excel stuff....

    Imports System.Xml.XPath

    Public Class ExportToExcel

      Shared Sub ExportData(ByVal xmlBuffer As String)

         Dim xDoc = XDocument.Parse(xmlBuffer)

         Dim trucks = From s In xDoc.Descendants("TRUCKS")

            Select <Row>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("CARRIER") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("CARRIERPHONE") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("UNITNO") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("UNITTYPE") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("DISTANCE") %></Data></Cell>

                      <Cell ss:StyleID="s70"><Data ss:Type="DateTime"><%= s.XPathSelectElement("AVAILABLE") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("STATUS") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("LOCATION") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("NOTE") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("PAYLOAD") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("LENGTH") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("WIDTH") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("HEIGHT") %></Data></Cell>

                      <Cell><Data ss:Type="Number"><%= s.XPathSelectElement("DOMICILE") %></Data></Cell>

                      <Cell><Data ss:Type="String"><%= s.XPathSelectElement("SATELLITE") %></Data></Cell>

                   </Row>

    I'm getting the spreadsheet, but not the values - any idea what I'm doing wrong?

Page 6 of 7 (94 items) «34567