Blog - Title

Retrieving the Default Style Name from the Styles Part - VB

Retrieving the Default Style Name from the Styles Part - VB

  • Comments 1

[Table of Contents] [Next Topic]

There is a problem in the example presented in the previous topic, which is that it sets the Style property of the anonymous type to null if there is no style on the paragraph.  This is incorrect; we should use another query to find the default style in the styles part.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Here is the query to retrieve the default style:

Dim defaultStyle As String = _
    CStr(styleDoc.Root _
        .Elements(w + "style") _
        .Where(Function(style) _
            CStr(style.Attribute(w + "type")) = "paragraph" And _
            CStr(style.Attribute(w + "default")) = "1") _
        .First() _
        .Attribute(w + "styleId"))
 

We can then pass this variable to GetParagraphStyle, so that if there is no style specified for the paragraph, the function returns the default style.

Public Function GetParagraphStyle(ByVal para As XElement, _
                                  ByVal defaultStyle As String) As String
    Dim w As XNamespace = _
        "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    Dim paraStyle = CStr(para.Elements(w + "pPr") _
                   .Elements(w + "pStyle") _
                   .Attributes(w + "val") _
                   .FirstOrDefault())
    If (paraStyle Is Nothing) Then
        Return defaultStyle
    Else
        Return paraStyle
    End If
End Function
 

We can now modify the query to pass the defaultStyle to GetParagraphStyle:

Dim paragraphs = _
    mainPartDoc.Root _
        .Element(w + "body") _
        .Descendants(w + "p") _
        .Select(Function(p) _
            New With { _
                .ParagraphNode = p, _
                .Style = GetParagraphStyle(p, defaultStyle) _
            } _
        )
 

We can write part of the query that retrieves the default style using a query expression.  However, there is no way to express the First call in a query expression, so we must surround the query expression with parentheses, and then dot into the First method:

Dim defaultStyle As String = _
    CStr( _
            ( _
                From style in styleDoc.Root _
                    .Elements(w + "style") _
                Where( _
                    CStr(style.Attribute(w + "type")) = "paragraph" And _
                    CStr(style.Attribute(w + "default")) = "1") _
            ) _
            .First() _
            .Attribute(w + "styleId") _
        )
 

My personal preferred style is to use method syntax in this situation.

One more point about this assignment:  because we used the First extension method, the source is iterated, and the value of the variable is set immediately.  Unlike the query that finds the paragraphs, which actually does nothing until we iterate through the query using a For Each statement, the First extension method causes the query to execute immediately, and the value of the string defaultStyle variable to be set.

Now, when we run the program, we see:

Heading1     /document/body/p
Normal       /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Code         /document/body/p
Normal       /document/body/p
Code         /document/body/p
 

This is what we wanted from this transformation.

The entire listing follows.  Note that we had to read the styles part into an XDocument.

Imports System.IO
Imports System.Xml
Imports DocumentFormat.OpenXml.Packaging
 
Module Module1
    <System.Runtime.CompilerServices.Extension()> _
    Public Function GetPath(ByVal el As XElement) As String
        Return el _
            .AncestorsAndSelf _
            .InDocumentOrder _
            .Aggregate("", Function(seed, i) seed & "/" & i.Name.LocalName)
    End Function
 
    Public Function LoadXDocument(ByVal part As OpenXmlPart) _
            As XDocument
        Using streamReader As StreamReader = New StreamReader(part.GetStream())
            Using xmlReader As XmlReader = xmlReader.Create(streamReader)
                Return XDocument.Load(xmlReader)
            End Using
        End Using
    End Function
 
    Public Function GetParagraphStyle(ByVal para As XElement, _
                                      ByVal defaultStyle As String) As String
        Dim w As XNamespace = _
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        Dim paraStyle = CStr(para.Elements(w + "pPr") _
                       .Elements(w + "pStyle") _
                       .Attributes(w + "val") _
                       .FirstOrDefault())
        If (paraStyle Is Nothing) Then
            Return defaultStyle
        Else
            Return paraStyle
        End If
    End Function
 
    Sub Main()
        Dim w As XNamespace = _
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        Dim filename As String = "SampleDoc.docx"
        Using wordDoc As WordprocessingDocument = _
            WordprocessingDocument.Open(filename, True)
            Dim mainPart As MainDocumentPart = _
                wordDoc.MainDocumentPart
            Dim styleDefinitionPart As StyleDefinitionsPart = _
                mainPart.StyleDefinitionsPart
            Dim commentsPart As WordprocessingCommentsPart = _
                mainPart.WordprocessingCommentsPart
            Dim mainPartDoc As XDocument = LoadXDocument(mainPart)
            Dim styleDoc As XDocument = LoadXDocument(styleDefinitionPart)
            Dim commentsDoc As XDocument = LoadXDocument(commentsPart)
 
            Dim defaultStyle As String = _
                CStr( _
                        ( _
                            From style in styleDoc.Root _
                                .Elements(w + "style") _
                            Where( _
                                CStr(style.Attribute(w + "type")) = "paragraph" And _
                                CStr(style.Attribute(w + "default")) = "1") _
                        ) _
                        .First() _
                        .Attribute(w + "styleId") _
                    )
 
            Dim paragraphs = _
                mainPartDoc.Root _
                    .Element(w + "body") _
                    .Descendants(w + "p") _
                    .Select(Function(p) _
                        New With { _
                            .ParagraphNode = p, _
                            .Style = GetParagraphStyle(p, defaultStyle) _
                        } _
                    )
 
            For Each p In paragraphs
                Console.WriteLine("{0} {1}", p.Style.PadRight(12), _
                    p.ParagraphNode.GetPath())
            Next
        End Using
    End Sub
End Module
 

[Table of Contents] [Next Topic] [Blog Map]

Leave a Comment
  • Please add 8 and 5 and type the answer here:
  • Post
  • Hi,

    I need to collect word document para lines with style names. How can i do that?

    Regards

    Selva

Page 1 of 1 (1 items)