Blog Map
[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
This is a fun, geeky post for those interested in functional programming.
Sometimes you have flat data where there is hierarchy expressed in the data, but the form of the data is flat. You may need to transform this data into a hierarchy, such as an XML tree.
For instance, you may have the following data:
Heading[] headings = new[] {
new Heading { Level = 1, Text = "Intro" },
new Heading { Level = 2, Text = "Hello" },
new Heading { Level = 2, Text = "Summary" },
new Heading { Level = 1, Text = "Technical Info" },
new Heading { Level = 2, Text = "Class API" },
new Heading { Level = 3, Text = "Class1" },
new Heading { Level = 3, Text = "Class2" },
new Heading { Level = 2, Text = "Interoperability" },
new Heading { Level = 3, Text = "Scenario 1" },
new Heading { Level = 3, Text = "Scenario 2" },
new Heading { Level = 1, Text = "In Conclusion" }
};
And you want to transform this data into the following XML:
<Root>
<Heading1 Name="Intro">
<Heading2 Name="Hello" />
<Heading2 Name="Summary" />
</Heading1>
<Heading1 Name="Technical Info">
<Heading2 Name="Class API">
<Heading3 Name="Class1" />
<Heading3 Name="Class2" />
</Heading2>
<Heading2 Name="Interoperability">
<Heading3 Name="Scenario 1" />
<Heading3 Name="Scenario 2" />
<Heading1 Name="In Conclusion" />
</Root>
This is easy to do – you create a recursive function that returns the headings for the next level down. This function can use the SkipWhile and TakeWhile extension methods to return the next level down headings. This is certainly not the most efficient way to do it, but I was interested in coding this as fast as possible – I’m only going to run this once, and don’t care very much how long it takes.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Heading
public int Level { get; set; }
public string Text { get; set; }
}
class Program
public static IEnumerable<XElement> GetChildrenHeadings(
IEnumerable<Heading> headingList,
Heading parent)
return
headingList
.SkipWhile(h => h != parent)
.Skip(1)
.TakeWhile(h => h.Level > parent.Level)
.Where(h => h.Level == parent.Level + 1)
.Select(h =>
new XElement("Heading" + h.Level,
new XAttribute("Name", h.Text),
GetChildrenHeadings(headingList, h)
)
);
public static void Main(string[] args)
var toc = new XElement("Root",
headings
.Where(h => h.Level == 1)
.Select
(
h => new XElement("Heading" + h.Level,
GetChildrenHeadings(headings, h)
Console.WriteLine(toc);
Today, I had the need to take the Ecma Open XML spec part 4 and generate a TOC in XML. I used the code from this blog post to return a collection of the paragraphs, and the technique shown above to generate the TOC. Loading the doc takes 7 seconds on my machine, generating the TOC in this manner is less than a second, so even though it is not the most efficient, it is more than efficient enough for my purposes.
using Microsoft.Examples.LtxOpenXml;
using DocumentFormat.OpenXml.Packaging;
public string StyleName { get; set; }
public int HeadingLevel { get; set; }
IEnumerable<Heading> headingList, Heading parent)
.TakeWhile(h => h.HeadingLevel > parent.HeadingLevel)
.Where(h => h.HeadingLevel == parent.HeadingLevel + 1)
new XElement(h.StyleName,
using (WordprocessingDocument doc =
WordprocessingDocument.Open(
"Office Open XML Part 4 - Markup Language Reference.docx",
false))
// materialize to list for efficiency
List<Heading> headingList =
doc.MainDocumentPart
.Paragraphs()
.Where(p => p.StyleName.StartsWith("Heading"))
.Select(p =>
int headingLevel;
Int32.TryParse(p.StyleName.Substring(7),
out headingLevel);
return new Heading
StyleName = p.StyleName,
Text = p.Text,
HeadingLevel = headingLevel
.ToList();
.Where(p => p.HeadingLevel == 1)
p => new XElement(p.StyleName,
new XAttribute("Name", p.Text),
GetChildrenHeadings(headingList, p)
toc.Save("toc.xml");
Console.WriteLine(headingList.Count());
Console.WriteLine(toc.Descendants().Count());
When run, it produces:
<?xml version="1.0" encoding="utf-8"?>
<Heading1 Name="Part Overview">
<Heading2 Name="WordprocessingML Part Summary" />
<Heading2 Name="SpreadsheetML Part Summary" />
<Heading2 Name="PresentationML Part Summary" />
<Heading2 Name="DrawingML Part Summary" />
<Heading2 Name="Shared Part Summary" />
<Heading1 Name="WordprocessingML Reference Material">
<Heading2 Name="Table of Contents" />
<Heading2 Name="Main Document Story">
<Heading3 Name="background (Document Background)" />
<Heading3 Name="body (Document Body)" />
<Heading3 Name="document (Document)" />
<Heading2 Name="Paragraphs and Rich Formatting">
<Heading3 Name="Paragraphs">
<Heading4 Name="adjustRightInd (Automatically Adjust Right Indent When Using Document Grid)" />
<Heading4 Name="autoSpaceDE (Automatically Adjust Spacing of Latin and East Asian Text)" />
<Heading4 Name="autoSpaceDN (Automatically Adjust Spacing of East Asian Text and Numbers)" />
<Heading4 Name="bar (Paragraph Border Between Facing Pages)" />
...
Code is attached.
Eric White on Linq and Open XML. Eric White has been posting some great code samples lately, including
Ce post n’a pas voulu partir ni jeudi ni vendredi, le voici donc ! Des mise à jours à n’en plus finir
Vous générer des documents Open XML, Office ne veut pas les ouvrir ? Voici quelques moyens d’en découdre