Welcome to MSDN Blogs Sign in | Join | Help

You had me at "Hello World!"

XAML, WPF, Silverlight, .NET, Office 2007, Windows code samples and other interesting things

Syndication

Retrieving Information From Excel 2007 Files

Ever needed a sample code to programmatically extract data from Excel 2007 table?

 

Given an excel table like this (in book1.xlsx):

 

Name

ID

Color

Ball

1

White

Phone

2

Green

Cup

3

Yellow

 

 

Here’s a function to retrieve the contents of the table:

 

/// <summary>

/// Retrieves an excel table from a given sheet in an xlsx file

/// </summary>

/// <param name="fileName">xlsx file name</param>

/// <param name="sheetName">Sheet name (e.g. "Sheet1"</param>

/// <returns>table[rows][cols] containing all the table information for the given sheet in the given file

/// Cells with no information are returned as null. The table starts at the first element containing data in the sheet.</returns>

public List<List<string>> XLGetTable(string fileName, string sheetName);

 

 

And a sample usage of the above function:

 

List<List<string>> table = XLGetTable(@"book1.xlsx", "Sheet1");

 

int rows, cols;

if (table.Count == 0) {

    rows = 0;

    cols = 0;

}

else {

    rows = table.Count;

    cols = table[0].Count;

}

 

Debug.WriteLine(String.Format("rows = {0}, cols = {1}", rows, cols));

 

for (int row = 0; row < rows; row++) {

    for (int col = 0; col < cols; col++) {

        Debug.Write(String.Format("{0} ({1}, {2})\t\t", table[row][col], row, col));

    }

    Debug.WriteLine(String.Empty);

}

 

 

Output for the above sample in the VS Debug Window:

 

rows = 4, cols = 3

Name (0, 0)       ID (0, 1)         Color (0, 2)           

Ball (1, 0)       1 (1, 1)          White (1, 2)           

Phone (2, 0)      2 (2, 1)          Green (2, 2)           

Cup (3, 0)        3 (3, 1)          Yellow (3, 2)    

 

See the attached ExcelHelper.cs for the full code of the above sample.

 

The sample is based on 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005.

The Office 2007 files are using the Open XML file format. They are zip files with embedded data.

For more information check out the Ecma Office Open XML Formats architecture guide.

If you look inside an xlsx file (rename it to Zip and then open it) or uncomment the debugging code in the sample; you will see that the Excel data sheet is partially represented with XML similar to this one:

 

<?

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

- <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">

  <dimension ref="B3:D6" />

- <sheetViews>

- <sheetView tabSelected="1" workbookViewId="0">

  <selection activeCell="B3" sqref="B3:D3" />

  </sheetView>

  </sheetViews>

  <sheetFormatPr defaultRowHeight="15" />

- <sheetData>

- <row r="3" spans="2:4">

- <c r="B3" t="s">

  <v>0</v>

  </c>

- <c r="C3" t="s">

  <v>3</v>

  </c>

- <c r="D3" t="s">

  <v>2</v>

  </c>

  </row>

- <row r="4" spans="2:4">

- <c r="B4" t="s">

  <v>1</v>

  </c>

- <c r="C4">

  <v>1</v>

  </c>

- <c r="D4" t="s">

  <v>4</v>

  </c>

  </row>

- <row r="5" spans="2:4">

- <c r="B5" t="s">

  <v>5</v>

  </c>

- <c r="C5">

  <v>2</v>

  </c>

- <c r="D5" t="s">

  <v>6</v>

  </c>

  </row>

- <row r="6" spans="2:4">

- <c r="B6" s="1" t="s">

  <v>7</v>

  </c>

- <c r="C6" s="1">

  <v>3</v>

  </c>

- <c r="D6" s="1" t="s">

  <v>8</v>

  </c>

  </row>

  </sheetData>

  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />

  <pageSetup orientation="portrait" horizontalDpi="4294967293" verticalDpi="0" r:id="rId1" />

- <tableParts count="1">

  <tablePart r:id="rId2" />

  </tableParts>

  </worksheet>

 

 

You probably notice that there are no strings like “Ball”, “Phone” or “Cup” in the above XML file. That is because the strings are stored in a separate file to avoid redundancy. The sheet XML contains references to the actual strings.

 

Xlsx files can be read using functions from the System.IO.Packaging namespace. The attached code opens the xlsx file, retrieves the “main document part” - worksheet, looks for the given sheet name, and extracts the XML to construct the table information.

 

Published Sunday, February 18, 2007 5:43 PM by nikola


Attachment(s): ExcelHelper.cs

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Retrieving Information From Excel 2007 Files @ Friday, May 09, 2008 7:05 AM

Hi,

Here you can find article that shows how to read/write to XLSX files without System.IO.Packaging namespace (framework < 3.5): http://www.gemboxsoftware.com/Excel2007/DemoApp.htm.

Mario

www.GemBoxSoftware.com

Mario

# re: Retrieving Information From Excel 2007 Files @ Wednesday, July 08, 2009 9:16 PM

Your page does not work in IE8 (without the compatibility button being pushed) :-(

SimonC

# re: Retrieving Information From Excel 2007 Files @ Sunday, July 12, 2009 11:05 PM

Hi Mario, thank you for the link!

nikola

# re: Retrieving Information From Excel 2007 Files @ Sunday, July 12, 2009 11:06 PM

Hi Simon, thank you for the note! I think the issue is fixed now, because the page renders fine in the IE8 that I have here (Win 7 RC)...Please let me know if it's still not rendering correctly for you without compatibility mode! Thanks!

nikola

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
Page view tracker