In a previous post, I gave you an overview of the functionality added to the Open XML SDK 2.0 August 2009 CTP. Today, I want to deep dive into the schema and semantic level validation support within the SDK. Specifically, I am going to show you guys the Open XML SDK code needed to actually validate your Open XML files.
If you've played around with manipulating Open XML files there is a good chance at one point in time your resulting document was considered invalid or corrupt by the applications. You've probably even seen one of these dialogs:
What do you do when you get into this state? A lot of the time the application error dialogs don't really help you debug the issue. Well, that's where the Open XML SDK can help you out. With just a few lines of code you can identify key pieces of information that tell you what the error is and where to find it within the package. Validation with the Open XML SDK 2.0 is accomplished via the OpenXmlValidator class. This class allows you to enumerate all the errors within a file, where each error is represented via the ValidationErrorInfo class. The ValidationErrorInfo class stores the following information:
Here is a code snippet you can reuse to validate Word documents:
The same code can be used to validate Excel and PowerPoint documents. All you need to do is change the Open method to be one of the following:
foreach (ValidationErrorInfo error in validator.Validate(PresentationDocument.Open("InvalidFile.pptx", true)))
foreach (ValidationErrorInfo error in validator.Validate(SpreadsheetDocument.Open("InvalidFile.xlsx", true)))
Pretty simple stuff! If you want to jump straight into the code, feel free to download this solution here.
Let's walk through an example of validating and fixing an example corrupt Word document. Given this corrupt document, the Open XML SDK detects the following errors:
Let's look at each of these errors.
Let's take a look at the xml within the main document part:
The error indicates that the length of the value for rsidR is not correct. We can fix this issue by changing the value to 00006B4C.
Let's take a look at the xml within the footnotes part:
The error indicates that there is a reference to a footnote using the value "3", but no such value exists in the footnotes part. Let's go ahead and change the footnoteReference to have a value of "2".
Let's take a look at the xml within the endnotes part:
The error indicates that that more than one endnote specify the same id value. Let's go ahead and change the values to be unique.
After making these fixes we should be able to open the fixed document with no issues as shown below:
Try out the validation functionality and let us know what you think.