Quickly Import and Export Excel Data with LINQ to XML

Quickly Import and Export Excel Data with LINQ to XML

Rate This
  • Comments 95

One really handy thing to do with relational data is to import and export it to Excel. This can be a total snap using LINQ to XML in Visual Basic 2008. For example, let's say I want to import data from an Excel spreadsheet into a local database file like SQL Compact Edition. I also want to be able to export that data back into an Excel spreadsheet, basically allowing me to edit the data from either my application or Excel.

The trick really is getting the format of the Excel spreadsheet as XML and using that sheet as a template to populate your data. First open Excel (for this example I'll use the Excel 2003 format so you can use Excel 2007 or 2003 to follow along). Next let's enter some column headers, for my example I have a list of customers I want to import/export so the fields I'll want are Abbrev, Name, Phone, Country. It's also helpful to enter one line of dummy data (you'll see why in a second). So here' what we've got:

Now what I want to do is get all the customers stored in my little SQL CE local database into this spreadsheet.

(As a side note, you can easily add a SQL CE local database file to your project by selecting the "Local Database" new item template in Visual Studio 2008. You can easily access the data in SQL CE sdf files by adding a new datasource (Data --> Add New Datasource), adding a new connection, and change the data source to "Microsoft Compact SQL Server 3.5" then locate or create the .sdf file. Then you can create typed Datasets like normal.)

So I have a form where I just dragged my CustomersDataSet that I created from my local database file onto it as a DataGridView. This also sets up the BindingSource and BindingNavigator toolstrip like we'd expect. I added a couple buttons to the toolstrip one called "Import" and one called "Export".

Now in order to get the Excel spreadsheet into an XML format you can simply "Save As.." and select "XML Spreadsheet 2003 (.xml)" as the file type. Now navigate to that file and open it with notepad, select and copy all the text into your clipboard. Open up your Visual Basic program, type "Dim sheet =" and then paste the contents of your clipboard directly into the editor. It should look something like this (I collapsed some of the sections we're going to ignore):

Next we need to import the namespaces in the spreadsheet. This is necessary so that when we create and query our XML, Visual Basic will know the correct namespaces we're working with. So at the top of code file for our form we'll add:

Imports <xmlns="urn:schemas-microsoft-com:office:spreadsheet">

Imports <xmlns:o="urn:schemas-microsoft-com:office:office">

Imports <xmlns:x="urn:schemas-microsoft-com:office:excel">

Imports <xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
Imports <xmlns:html="http://www.w3.org/TR/REC-html40">

Now we can write a query that will take all of the customers in our SQL CE database file (technically the typed DataSet that was created for us) and populate the spreadsheet. You should see your dummy data in the XML above. Cut that out (ctrl-X) because we're going to replace that with all our customers using embedded expressions like so:

Dim customers = _

    From customer In Me.CustomersDataSet.Customers _

    Where customer.RowState <> DataRowState.Deleted _

        AndAlso Not customer.IsAbbrevNull _

    Order By customer.Abbrev _

    Select <Row>

               <Cell><Data ss:Type="String"><%= customer.Abbrev %></Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsNameNull, "", customer.Name)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsPhoneNull, "", customer.Phone)%> </Data></Cell>

               <Cell><Data ss:Type="String"><%= If(customer.IsCountryNull, "",customer.Country)%> </Data></Cell>

           </Row>

I'm just selecting the data inside my typed Dataset which is populated in the load of the form, making sure not to grab any deleted rows or rows where the abbreviation is NULL. The customers variable above becomes a collection of XElements (IEnumerable(Of XElement)) and we simply pop that back into the sheet, replacing the dummy line we removed:

<Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount=<%= customers.Count + 1 %> x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">

<Row ss:StyleID="s62">

               <Cell><Data ss:Type="String">Abbrev</Data></Cell>

               <Cell><Data ss:Type="String">Name</Data></Cell>

               <Cell><Data ss:Type="String">Phone</Data></Cell>

               <Cell><Data ss:Type="String">Country</Data></Cell>

       </Row>

       <%= customers %>

</Table>

Also notice that I had to adjust the ss:ExpandedRowCount attribute as well to include the count of our customers plus one for the header row. Now we can write the results out to disk and start Excel:

sheet.Save("C:\Customers.xml")

Process.Start("Excel.exe", "C:\Customers.xml")

Cool!

Okay so now we want to be able to import this data as well. If any rows were added or edited we want that data back in our database. I'm going to use the Abbrev field as the unique identifier of our data. The query we need to write will group join (outer join) the current (non-deleted) rows in our CustomerDataSet to the rows here in the spreadsheet. That way if the resulting items in the collection contain a group (of one CustomerDataRow in this case), then we know that the CustomerDataRow needs to be updated, otherwise we know it's a new customer that needs to be added. We also need to skip the first row because that's our header row in the spreadsheet, and I'm doing an extra check for empty cells.

Dim sheet = XDocument.Load("C:\Customers.xml")

 

'Skip the first row because that is the header row in the spreadsheet.

'Notice also that we're eliminating deleted rows from the group join.

'We'll use the existence of the group collection to determine if the

' row needs to be added to the database.

Dim customers = From xmlCustomer In sheet...<Row> _

                   Let Abbrev = xmlCustomer.<Cell>(0) _

                   Let Name = xmlCustomer.<Cell>(1) _

                   Let Phone = xmlCustomer.<Cell>(2) _

                   Let Country = xmlCustomer.<Cell>(3) _

                Where _

                   Abbrev IsNot Nothing AndAlso _

                   Name IsNot Nothing AndAlso _

                   Phone IsNot Nothing AndAlso _

                   Country IsNot Nothing _

                Group Join customer In _

                    (From row In Me.CustomersDataSet.Customers _

                       Where row.RowState <> DataRowState.Deleted _

                             AndAlso Not row.IsAbbrevNull _

                       Select row) _

                    On customer.Abbrev Equals Abbrev _

                    Into Group _

                Select _

                    Abbrev = Abbrev.Value, _

                    Name = Name.Value, _

                    Phone = Phone.Value, _

                    Country = Country.Value, _

                    Group _

                Skip 1

 

 

'Add or update the customer row depending on if the

' group collection exists.

For Each customer In customers

    Dim r As CustomersDataSet.CustomersRow

    If customer.Group.Count = 0 Then

        'Add new row ands set the customer abbreviation.

        ' This field is used to determine uniqueness of the data

        r = Me.CustomersDataSet.Customers.NewCustomersRow()

        r.Abbrev = customer.Abbrev

    Else

        r = customer.Group(0)

    End If

 

    r.Name = customer.Name

    r.Phone = customer.Phone

    r.Country = customer.Country

 

    If r.RowState = DataRowState.Detached Then

        Me.CustomersDataSet.Customers.AddCustomersRow(r)

    End If

Next

Now we can make changes to the spreadsheet, import the data into our local database, or make changes to our local database and export those changes to Excel. I've attached the complete example built with Visual Basic 2008. You'll also need Excel 2003 or higher to view the spreadsheet. It's worth reiterating that you don't actually need Excel to create the spreadsheet in Visual Basic via code as long as you have the XML format, you just need it if you want to open it in Excel later. Remember we're not using automation, we're just creating XML in a specific format.

Enjoy!

Attachment: ImportExportExcel.zip
Leave a Comment
  • Please add 5 and 3 and type the answer here:
  • Post
  • Hai Everyone

    Please suggest how to export datas from Excel file to txt using miscrosoft office vb.

    Regards

    Ganesh

  • Hi Beth,

    I am writing an app that needs to take data from excel and put into a database - so the code you have given is a great starting point.

    Unfortunately if I edit the spreadsheet or have a column with a null value the LINQ query is set to nothing.  If I create a new SS it will work, however only if all columns contain a value.  

    I would be great to get this to work reliably, otherwise it's back to exporting the excel data to a txt file for the sake of reliability.

    If anyone responds - Thanks

  • Hi Coleman,

    You just need to change the query to check for null and provide a default value instead. Here I modified the query to return the empty string "" if the fields are not filled out. Note that Abbrev still must not be null since that is the key we're using.

    Dim customers = From xmlCustomer In sheet...<Row> _

              Let Abbrev = xmlCustomer.<Cell>(0) _

              Let Name = xmlCustomer.<Cell>(1) _

              Let Phone = xmlCustomer.<Cell>(2) _

              Let Country = xmlCustomer.<Cell>(3) _

           Where _

              Abbrev IsNot Nothing _

           Group Join customer In _

               (From row In Me.CustomersDataSet.Customers _

                  Where row.RowState <> DataRowState.Deleted _

                        AndAlso Not row.IsAbbrevNull _

                  Select row) _

               On customer.Abbrev Equals Abbrev _

               Into Group _

           Select _

               Abbrev = Abbrev.Value, _

               Name = If(Name Is Nothing, "", Name.Value), _

               Phone = If(Phone Is Nothing, "", Phone.Value), _

               Country = If(Country Is Nothing, "", Country.Value), _

               Group _

           Skip 1

    HTH,

    -Beth

  • Hi Beth,

    Thank you so for much for the response, works a like a treat.

    I had tried try to process the null, but was miles away on the correct syntax.

    Thanks again

  • Hi Beth,

    Thank you for this post, it was very helpful.

    I have a comment/question on it.  Your approach works fine assuming that you have the structure of the Customers Excel sheet as static.

    What I'm trying to ask is: Is it possible to have this work dynamically.  Where the user would navigate to an Excel document to choose, then import that document (hence the document can have more columns than the original Customers Excel document), and after the document is imported, the data is displayed in the DataGridView following the same rules that you have in your original code except for the data source.

    So the user would be able to import a Products table or a Locations table or any table they choose, and the DataTable in your Dataset would be created during runtime based on the source data.

    Best Regards,

    Alan

  • Hi Alan,

    You could probably do this dynamically using an untyped DataSet and just mapping the position of the fields to the Excel columns but it may be brittle. It's better to know at least part of the schema. And you'd still have to know the unique key column to do the group. I'd probably look at the Excel XML mapping or see if there was a way to infer the schema.

    HTH,

    -B

  • Hi Beth,

    Thanks for your response.

    I was wondering if it is possible to pass to a LINQ query a table name and a compare value as input parameters of a function, something like this:

    public Sub MyLinq(ByVal MyTable as DataTable, _

                              ByVal MyValue as String)

      Dim customers = _

               From _

                      Data In MyTable _

               Where _

                      Data.Field(Of String)(1) = MyValue _

               Select _

                       Data.Field(Of Integer)(0) _

               Distinct

      ' some logic here...

    End Sub

    Thanks,

    Alan

  • Hi Alan,

    Yep, you could do that, you'll end up with a collection of Integers in your example. Just put a try/catch around the method to catch any runtime errors (i.e. casting or missing fields) and remember to import System.Data

    HTH,

    -B

  • Hi Beth,

    Thank you for your response.

    I also want to thank you for the videos on LINQ usages and the other tutorials you made.  You have definitely simplified the LINQ learning experience.

    Best Regards,

    Alan

  • I am new to programming and this information has been very helpful for a project I have been working on. I do have one question, how can I export data from my database based on current date? I would like to export just records for the current day only.

  • Thank you Beth for your example.

    I have an instance where I have a set number of columns, as per your example - works great.

    However, in addition, there can extra, unknown, number of columns after that

    If there a way to check the column - ie to see if it has a header value, and if so, then it will have a value in the row (that is my situation) ?

    ie

    FirstName LastName Dynamic1 Dynamic2

    Beth          Massi         500.00     300.00

  • Hi Steve,

    If you take a close look I'm already assuming that the columns (except for Abbrev) may not exist in the spreadsheet, however they have to exist in the DataSet. So to capture the extra columns you'll need to modify the database table to to add these fields. Then you can either modify your typed dataset or you can dynamically add untyped columns to the DataTable. Then your query can just check for Nothing (null) value on the extra columns like I'm doing already for the other fields. However you may want to supply a different default value for empty fields than the empty string.

    Dim customers = From xmlCustomer In sheet...<Row> _

              Let Abbrev = xmlCustomer.<Cell>(0) _

              Let Name = xmlCustomer.<Cell>(1) _

              Let Phone = xmlCustomer.<Cell>(2) _

              Let Country = xmlCustomer.<Cell>(3) _

              Let DynamicField1 = xmlCustomer.<Cell>(4) _

           Where _

              Abbrev IsNot Nothing _

           Group Join customer In _

               (From row In Me.CustomersDataSet.Customers _

                  Where row.RowState <> DataRowState.Deleted _

                        AndAlso Not row.IsAbbrevNull _

                  Select row) _

               On customer.Abbrev Equals Abbrev _

               Into Group _

           Select _

               Abbrev = Abbrev.Value, _

               Name = If(Name Is Nothing, "", Name.Value), _

               Phone = If(Phone Is Nothing, "", Phone.Value), _

               Country = If(Country Is Nothing, "", Country.Value), _

               DynamicField1 = If(DynamicField1 Is Nothing, "", DynamicField1.Value), _

               Group _

           Skip 1

    HTH,

    -B

  • Hi Beth,

    Great example!

    I have a List(Of List(String)) that stores rows and columns so I don't have the ability to specify which column should be put where. You were able to put <Cell><Data ss:Type="String"><%= customer.Abbrev %></Data></Cell> as you knew that customer contained the Abbrev field. However my "customer" is just a list of cells with any number of elements. Is there a way of dealing with this too. I guess some kind of nested structure is required but I a not too sure how that would work.

    Thanks

    David

  • Beth,

    many thanks for the code, yet it is a bit too advanced for a mere novice like me.

    I am currently trying to edit your code to populate a 2 dimentional array from the excel sheet.  I want to update an old app that loads the array from 15 seperate txt files.

    Any pointers will be greatly appreciated.

    Rgds

    Bob

  • I was able to get it to work with the dataset to excel.

    Great code - thanks.

    Not sure if it's just a bug in Excel, but even though the type is getting set as a 'Number' (ie. a Double), it's not showing up as as double in the spreadsheet.

    Dates work, not numbers ?

Page 4 of 7 (95 items) «23456»