Blog - Title

Retrieving the Two Code Groups - VB

Retrieving the Two Code Groups - VB

  • Comments 0

[Table of Contents] [Next Topic]

There are two groups of paragraphs in our document that are styled as "Code".  The first group contains the C# code that we want to test.  The second group contains a single paragraph that is the output of the code in the first group.  Next in the process of formulating our query, we want to retrieve each block of code as a separate group.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
The problem is, the GroupBy extension method doesn't do what we want.  It groups all items together in the collection, regardless of if they are separated by other items.  It would join our two groups of code, which we want to keep separate.

For instance, if we amend the code to group the paragraphs, adding one more query to the bottom of our string of queries, as follows:

Dim defaultStyle As String = _
    CStr( _
            ( _
                From style in styleDoc.Root _
                    .Elements(w + "style") _
                Where( _
                    CStr(style.Attribute(w + "type")) = "paragraph" And _
                    CStr(style.Attribute(w + "default")) = "1") _
            ) _
            .First() _
            .Attribute(w + "styleId") _
        )
 
Dim paragraphs = _
    mainPartDoc.Root _
        .Element(w + "body") _
        .Descendants(w + "p") _
        .Select(Function(p) _
            New With { _
                .ParagraphNode = p, _
                .Style = GetParagraphStyle(p, defaultStyle) _
            } _
        )
 
Dim r As XName = w + "r"
Dim ins As XName = w + "ins"
 
Dim paragraphsWithText = _
    paragraphs.Select(Function(p) _
        New With { _
            .ParagraphNode = p.ParagraphNode, _
            .Style = p.Style, _
            .Text = p.ParagraphNode _
                .Elements() _
                .Where(Function(z) z.Name = r or z.Name = ins) _
                .Descendants(w + "t") _
                .StringConcatenate(Function(s) CStr(s)) _
        } _
    )
 
Dim groupedCodeParagraphs = _
    paragraphsWithText.GroupBy(Function(p) p.Style)
 
For Each g In groupedCodeParagraphs
    Console.WriteLine("Group of paragraphs styled {0}", g.Key)
    For Each p In g
        Console.WriteLine("{0} {1}", _
                    p.Style.PadRight(12), _
                    p.Text)
    Next
    Console.WriteLine()
Next
 

Then we see:

Group of paragraphs styled Heading1
Heading1     Parsing WordprocessingML with LINQ to XML
 
Group of paragraphs styled Normal
Normal       The following example prints to the console.
Normal       This example produces the following output:
 
Group of paragraphs styled Code
Code         using System;
Code
Code         class Program {
Code             public static void Main(string[] args) {
Code                 Console.WriteLine("Hello World");
Code             }
Code         }
Code
Code         Hello World
 

This grouped the "Hello World" with the code, which is not what we want.

As it turns out, there isn't a standard query operator that does exactly what we want.  We want an operator that groups only adjacent fields with a common key.  So let's write one.  In addition to the GroupAdjacent extension method, we need an GroupOfAdjacent class that we can iterate through for each grouping.  It only takes a couple dozen lines of code to implement this.

Unlike the C# version, the GroupAdjacent implementation for Visual Basic is not lazy.  But this really doesn’t impact performance in any noticeable way, even for large documents.

Before this version of GroupAdjacent returns the first group, it iterates through the entire collection, creating a list of lists.

To use GroupAdjacent, we pass it a lambda that selects the value that when that value changes, the operator creates a new group.  GroupAdjacent then is a sequence of groups, each of which contain a sequence of type T.

Here is the listing:

Imports System.IO
Imports System.Xml
Imports System.Text
Imports DocumentFormat.OpenXml.Packaging
 
Public Class GroupOfAdjacent(Of TElement, TKey)
    Implements IEnumerable(Of TElement)
 
    Private _key As TKey
    Private _groupList As List(Of TElement)
 
    Public Property GroupList() As List(Of TElement)
        Get
            Return _groupList
        End Get
        Set(ByVal value As List(Of TElement))
            _groupList = value
        End Set
    End Property
 
    Public ReadOnly Property Key() As TKey
        Get
            Return _key
        End Get
    End Property
 
    Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of TElement) _
            Implements System.Collections.Generic.IEnumerable(Of TElement).GetEnumerator
        Return _groupList.GetEnumerator
    End Function
 
    Public Function GetEnumerator1() As System.Collections.IEnumerator _
            Implements System.Collections.IEnumerable.GetEnumerator
        Return _groupList.GetEnumerator
    End Function
 
    Public Sub New(ByVal key As TKey)
        _key = key
        _groupList = New List(Of TElement)
    End Sub
End Class
 
Module Module1
    <System.Runtime.CompilerServices.Extension()> _
    Public Function GroupAdjacent(Of TElement, TKey)(ByVal source As IEnumerable(Of TElement), _
            ByVal keySelector As Func(Of TElement, TKey)) As List(Of GroupOfAdjacent(Of TElement, TKey))
        Dim lastKey As TKey = Nothing
        Dim currentGroup As GroupOfAdjacent(Of TElement, TKey) = Nothing
        Dim allGroups As List(Of GroupOfAdjacent(Of TElement, TKey)) = New List(Of GroupOfAdjacent(Of TElement, TKey))()
        For Each item In source
            Dim thisKey As TKey = keySelector(item)
            If lastKey IsNot Nothing And Not thisKey.Equals(lastKey) Then
                allGroups.Add(currentGroup)
            End If
            If Not thisKey.Equals(lastKey) Then
                currentGroup = New GroupOfAdjacent(Of TElement, TKey)(keySelector(item))
            End If
            currentGroup.GroupList.Add(item)
            lastKey = thisKey
        Next
        If lastKey IsNot Nothing Then
            allGroups.Add(currentGroup)
        End If
        Return allGroups
    End Function
 
    <System.Runtime.CompilerServices.Extension()> _
    Public Function GetPath(ByVal el As XElement) As String
        Return el _
            .AncestorsAndSelf _
            .InDocumentOrder _
            .Aggregate("", Function(seed, i) seed & "/" & i.Name.LocalName)
    End Function
 
    <System.Runtime.CompilerServices.Extension()> _
    Function StringConcatenate(Of T) _
            (ByVal source As IEnumerable(Of T), ByVal projectionFunc As Func(Of T, String)) _
            As String
        Return source.Aggregate(New StringBuilder, _
            Function(sb, i) sb.Append(projectionFunc(i)), _
            Function(sb) sb.ToString)
    End Function
 
    Public Function LoadXDocument(ByVal part As OpenXmlPart) _
            As XDocument
        Using streamReader As StreamReader = New StreamReader(part.GetStream())
            Using xmlReader As XmlReader = xmlReader.Create(streamReader)
                Return XDocument.Load(xmlReader)
            End Using
        End Using
    End Function
 
    Public Function GetParagraphStyle(ByVal para As XElement, _
                                      ByVal defaultStyle As String) As String
        Dim w As XNamespace = _
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        Dim paraStyle = CStr(para.Elements(w + "pPr") _
                       .Elements(w + "pStyle") _
                       .Attributes(w + "val") _
                       .FirstOrDefault())
        If (paraStyle Is Nothing) Then
            Return defaultStyle
        Else
            Return paraStyle
        End If
    End Function
 
    Sub Main()
        Dim w As XNamespace = _
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
        Dim filename As String = "SampleDoc.docx"
        Using wordDoc As WordprocessingDocument = _
            WordprocessingDocument.Open(filename, True)
            Dim mainPart As MainDocumentPart = _
                wordDoc.MainDocumentPart
            Dim styleDefinitionPart As StyleDefinitionsPart = _
                mainPart.StyleDefinitionsPart
            Dim commentsPart As WordprocessingCommentsPart = _
                mainPart.WordprocessingCommentsPart
            Dim mainPartDoc As XDocument = LoadXDocument(mainPart)
            Dim styleDoc As XDocument = LoadXDocument(styleDefinitionPart)
            Dim commentsDoc As XDocument = LoadXDocument(commentsPart)
            Dim defaultStyle As String = _
                CStr( _
                        ( _
                            From style In styleDoc.Root _
                                .Elements(w + "style") _
                            Where ( _
                                CStr(style.Attribute(w + "type")) = "paragraph" And _
                                CStr(style.Attribute(w + "default")) = "1") _
                        ) _
                        .First() _
                        .Attribute(w + "styleId") _
                    )
            Dim paragraphs = _
                mainPartDoc.Root _
                    .Element(w + "body") _
                    .Descendants(w + "p") _
                    .Select(Function(p) _
                        New With { _
                            .ParagraphNode = p, _
                            .Style = GetParagraphStyle(p, defaultStyle) _
                        } _
                    )
            Dim r As XName = w + "r"
            Dim ins As XName = w + "ins"
            Dim paragraphsWithText = _
                paragraphs.Select(Function(p) _
                    New With { _
                        .ParagraphNode = p.ParagraphNode, _
                        .Style = p.Style, _
                        .Text = p.ParagraphNode _
                            .Elements() _
                            .Where(Function(z) z.Name = r Or z.Name = ins) _
                            .Descendants(w + "t") _
                            .StringConcatenate(Function(s) CStr(s)) _
                    } _
                )
 
            Dim groupedCodeParagraphs = _
                paragraphsWithText.GroupAdjacent(Function(p) p.Style)
 
            For Each g In groupedCodeParagraphs
                Console.WriteLine("Group of paragraphs styled {0}", g.Key)
                For Each p In g
                    Console.WriteLine("{0} {1}", _
                                p.Style.PadRight(12), _
                                p.Text)
                Next
                Console.WriteLine()
            Next
        End Using
    End Sub
End Module
 
Group of paragraphs styled Heading1
Heading1     Parsing WordprocessingML with LINQ to XML
 
Group of paragraphs styled Normal
Normal       The following example prints to the console.
 
Group of paragraphs styled Code
Code         using System;
Code
Code         class Program {
Code             public static void Main(string[] args) {
Code                 Console.WriteLine("Hello World");
Code             }
Code         }
Code
 
Group of paragraphs styled Normal
Normal       This example produces the following output:
 
Group of paragraphs styled Code
Code         Hello World
 

This is what we want.

[Table of Contents] [Next Topic] [Blog Map]

Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post