Blog Map
[Table of Contents] [Next Topic] [Blog Map] This blog is inactive. New blog: EricWhite.com/blog
There is a problem in the example presented in the previous topic, which is that it sets the Style property of the anonymous type to null if there is no style on the paragraph. This is incorrect; we should use another query to find the default style in the styles part.
Here is the query to retrieve the default style:
Dim defaultStyle As String = _ CStr(styleDoc.Root _ .Elements(w + "style") _ .Where(Function(style) _ CStr(style.Attribute(w + "type")) = "paragraph" And _ CStr(style.Attribute(w + "default")) = "1") _ .First() _ .Attribute(w + "styleId"))
We can then pass this variable to GetParagraphStyle, so that if there is no style specified for the paragraph, the function returns the default style.
Public Function GetParagraphStyle(ByVal para As XElement, _ ByVal defaultStyle As String) As String Dim w As XNamespace = _ "http://schemas.openxmlformats.org/wordprocessingml/2006/main" Dim paraStyle = CStr(para.Elements(w + "pPr") _ .Elements(w + "pStyle") _ .Attributes(w + "val") _ .FirstOrDefault()) If (paraStyle Is Nothing) Then Return defaultStyle Else Return paraStyle End IfEnd Function
We can now modify the query to pass the defaultStyle to GetParagraphStyle:
Dim paragraphs = _ mainPartDoc.Root _ .Element(w + "body") _ .Descendants(w + "p") _ .Select(Function(p) _ New With { _ .ParagraphNode = p, _ .Style = GetParagraphStyle(p, defaultStyle) _ } _ )
We can write part of the query that retrieves the default style using a query expression. However, there is no way to express the First call in a query expression, so we must surround the query expression with parentheses, and then dot into the First method:
Dim defaultStyle As String = _ CStr( _ ( _ From style in styleDoc.Root _ .Elements(w + "style") _ Where( _ CStr(style.Attribute(w + "type")) = "paragraph" And _ CStr(style.Attribute(w + "default")) = "1") _ ) _ .First() _ .Attribute(w + "styleId") _ )
My personal preferred style is to use method syntax in this situation.
One more point about this assignment: because we used the First extension method, the source is iterated, and the value of the variable is set immediately. Unlike the query that finds the paragraphs, which actually does nothing until we iterate through the query using a For Each statement, the First extension method causes the query to execute immediately, and the value of the string defaultStyle variable to be set.
Now, when we run the program, we see:
Heading1 /document/body/pNormal /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pCode /document/body/pNormal /document/body/pCode /document/body/p
This is what we wanted from this transformation.
The entire listing follows. Note that we had to read the styles part into an XDocument.
Imports System.IOImports System.XmlImports DocumentFormat.OpenXml.Packaging Module Module1 <System.Runtime.CompilerServices.Extension()> _ Public Function GetPath(ByVal el As XElement) As String Return el _ .AncestorsAndSelf _ .InDocumentOrder _ .Aggregate("", Function(seed, i) seed & "/" & i.Name.LocalName) End Function Public Function LoadXDocument(ByVal part As OpenXmlPart) _ As XDocument Using streamReader As StreamReader = New StreamReader(part.GetStream()) Using xmlReader As XmlReader = xmlReader.Create(streamReader) Return XDocument.Load(xmlReader) End Using End Using End Function Public Function GetParagraphStyle(ByVal para As XElement, _ ByVal defaultStyle As String) As String Dim w As XNamespace = _ "http://schemas.openxmlformats.org/wordprocessingml/2006/main" Dim paraStyle = CStr(para.Elements(w + "pPr") _ .Elements(w + "pStyle") _ .Attributes(w + "val") _ .FirstOrDefault()) If (paraStyle Is Nothing) Then Return defaultStyle Else Return paraStyle End If End Function Sub Main() Dim w As XNamespace = _ "http://schemas.openxmlformats.org/wordprocessingml/2006/main" Dim filename As String = "SampleDoc.docx" Using wordDoc As WordprocessingDocument = _ WordprocessingDocument.Open(filename, True) Dim mainPart As MainDocumentPart = _ wordDoc.MainDocumentPart Dim styleDefinitionPart As StyleDefinitionsPart = _ mainPart.StyleDefinitionsPart Dim commentsPart As WordprocessingCommentsPart = _ mainPart.WordprocessingCommentsPart Dim mainPartDoc As XDocument = LoadXDocument(mainPart) Dim styleDoc As XDocument = LoadXDocument(styleDefinitionPart) Dim commentsDoc As XDocument = LoadXDocument(commentsPart) Dim defaultStyle As String = _ CStr( _ ( _ From style in styleDoc.Root _ .Elements(w + "style") _ Where( _ CStr(style.Attribute(w + "type")) = "paragraph" And _ CStr(style.Attribute(w + "default")) = "1") _ ) _ .First() _ .Attribute(w + "styleId") _ ) Dim paragraphs = _ mainPartDoc.Root _ .Element(w + "body") _ .Descendants(w + "p") _ .Select(Function(p) _ New With { _ .ParagraphNode = p, _ .Style = GetParagraphStyle(p, defaultStyle) _ } _ ) For Each p In paragraphs Console.WriteLine("{0} {1}", p.Style.PadRight(12), _ p.ParagraphNode.GetPath()) Next End Using End SubEnd Module
[Table of Contents] [Next Topic] [Blog Map]
Hi,
I need to collect word document para lines with style names. How can i do that?
Regards
Selva