Blog - Title

Hex Dump using LINQ (in 7 Lines of Code)

Hex Dump using LINQ (in 7 Lines of Code)

  • Comments 10

At one point while debugging the HtmlConverter class, when I found certain situations in the XML, I wanted to dump the XML in binary to see the actual hex values of characters being used.  I got tired of stopping and examining the values in the debugger.  I did a couple of searches, and found some sample C# code to implement a simple hex dump, and noticing that it was about 30 lines of code, thought that it I could re-write the code using LINQ and it would be cleaner and smaller.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Following is a sample that dumps a byte array in hex:

byte[] ba = File.ReadAllBytes("test.xml");
int bytesPerLine = 16;
string hexDump = ba.Select((c, i) => new { Char = c, Chunk = i / bytesPerLine })
    .GroupBy(c => c.Chunk)
    .Select(g => g.Select(c => String.Format("{0:X2} ", c.Char))
        .Aggregate((s, i) => s + i))
    .Select((s, i) => String.Format("{0:d6}: {1}", i * bytesPerLine, s))
    .Aggregate("", (s, i) => s + i + Environment.NewLine);
Console.WriteLine(hexDump);

To break up the binary data into groups of bytes for each line, this example uses the idiom that I discussed in Chunking a Collection into Groups of Three.  Because this is quick-and-dirty code that I didn’t plan on leaving in the delivered code, I used the idiom from Ad-Hoc String Concatenation using LINQ.

The example converts the binary data into a string that you can then dump to the console or whatever.  The resulting string looks something like this:

000000: FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00
000016: 65 00 72 00 73 00 69 00 6F 00 6E 00 3D 00 22 00
000032: 31 00 2E 00 30 00 22 00 20 00 65 00 6E 00 63 00
000048: 6F 00 64 00 69 00 6E 00 67 00 3D 00 22 00 75 00
000064: 74 00 66 00 2D 00 31 00 36 00 22 00 20 00 73 00
000080: 74 00 61 00 6E 00 64 00 61 00 6C 00 6F 00 6E 00
000096: 65 00 3D 00 22 00 79 00 65 00 73 00 22 00 3F 00
000112: 3E 00 0D 00 0A 00 3C 00 52 00 6F 00 6F 00 74 00
000128: 3E 00 31 00 3C 00 2F 00 52 00 6F 00 6F 00 74 00
000144: 3E 00

This ratio of imperative code to declarative code (30 lines vs. 7 lines) is what I typically see when writing functional code using LINQ.  The declarative code is approximately 20% of the size of the imperative code.

Leave a Comment
  • Please add 1 and 8 and type the answer here:
  • Post
  • If would probably be more efficient to replace:

    c => String.Format("{0:X2} ", c.Char)

    with:

    c => c.Char.ToString("X2") + " "

    Even better, with a suitable extension method:

    public static string JoinString(this IEnumerable<string> value, string separator)

    {

       if (null == value) return string.Empty;

       return string.Join(separator, value.ToArray());

    }

    you can replace:

    g => g.Select(c => String.Format("{0:X2} ", c.Char)).Aggregate((s, i) => s + i)

    with:

    g => g.Select(c => c.Char.ToString("X2")).JoinString(" ")

    and:

    .Aggregate("", (s, i) => s + i + Environment.NewLine)

    with:

    .JoinString(Environment.NewLine)

    Of course, with a simple implementation of IGrouping<TKey, TValue>, you could write your own ReadFile method, which would be much more efficient:

    public static IEnumerable<IGrouping<int, byte>> ReadFile(string path, int bytesPerLine)

    {

       ...

    }

  • Like everyone else, I like to use LINQ for everything too, but in this case I just don't see the benefit compared to a 'classic' while loop. I suspect few people would find this version easier to understand to a 'procedural' equivalent either:

    (untested)

    byte[] ba = File.ReadAllBytes("test.xml");

    int bytesPerLine = 16;

    StringBuilder output = new StringBuilder();

    int index = 0;

    while (index < ba.Length)

    {

       if (index % bytesPerLine == 0)

           output.AppendFormat(

               "{0}{1:d6}: ",

               (index != 0 ? Environment.NewLine : ""), // output new line only after the first

               index * bytesPerLine                     // output address

           );

       output.AppendFormat("{0:X2} ", ba[index]);       // output each formatted byte

       index++;

    }

    Console.WriteLine(output.ToString());

  • Hi Ben,

    One of the aspects of LINQ and FP that I enjoy is how the code is a direct parallel to the way that I describe the projection.  For example, in the projection in the post, it reads like this:

    • First, break the byte array into chunks of n bytes.
    • Then, for each item in the collection of groups, project the short array of bytes to a string that contains the hex representation.  The result of this operation is a collection of strings.
    • Then, for each string in the collection, project a new string with the byte count at the beginning of the line.
    • Finally, aggregate the collection of strings into a single string, with newlines separating each string.

    This is what the LINQ architects are referring to when they talk about about 'the code describes what I want to do', instead of 'the code tells the computer how I want to do what I want to do'.

    When we first transitioned from procedural programming to object oriented programming, developers sometimes would have trouble looking at the class definition and identifying the abstraction represented by the class.  We have a similar transition in programming culture taking place now, where some developers can naturally read LINQ code, and others are still transitioning.  At this point, I really prefer to read queries.  In my code to convert Open XML to XHtml (about 3500 lines of code), my experience is that when I have to dig into code that I wrote some months ago, it is pretty easy to look at queries and projections and remember my intent when I originally wrote them.  Not to say that this isn't true for procedural code, but in general, my experience is that it is easier to revisit FP code that I wrote some time ago.

    -Eric

  • @Ben and Richard,

    Certainly is true that this code isn't optimal.  But it exemplifies what I do when I need to write some ad-hoc code in the course of performing some other task.  In these situations, I really spend very little time examining the queries/projections after I accomplish my task at hand.

    -Eric

  • Why aren't the offsets in hex?

    ;)

  • While I understand the concepts and the rational behind doing something like this in linq (the whole "here's what I want to do vs here's how to do what I want to do"), I'm still not convinced that that kind of approach is generically applicable.

    Visualizing an object based on the properties and methods described in its code takes a little getting used to, granted, but that's a far cry from visualizing what's happening in that Linq query. Not sure that's a good analogy to use.

    At any rate, this was an interesting post, if for nothing more than being a brain bender. Plus, I did not know VB2008 has anonymous types. Very cool.

  • I also have to wonder about utility of LINQ in this case. I can hardly imagine how you'd end up with 30 lines of code unless it actually did quite a bit more than what you have here. Using C as the quintessential procedural language, I wrote up what seemed like the most obvious implementation:

    #include <stdio.h>

    int main() {

       FILE *f = fopen("test.xml", "rb");

       int ch, offset = 0;

       while ((ch=getc(f)) != EOF) {

           if (offset %16 == 0)

               printf("\n%6.6d: ", offset);

           ++offset;

           printf("%2.2x ", ch);

       }

    }

    We're left with a few choices: maybe you picked an example in a spectacularly verbose language. Maybe you're comparing apples to oranges, and the 30 lines of code really did a lot that yours doesn't. Maybe the 30 lines of code was just a whole lot longer than necessary.

    IMO, you owe all LINQ users an apology. Publishing a comparison that's so grossly and obviously distorted and misleading makes it easy for others to brand LINQ advocates in general as sloppy, negligent, ignorant, and quite possible dishonest.

  • Ouch,  dude!    My intent here on my blog is to share what I learn as I learn it.  I have certainly published posts where I subsequently discovered better ways to do something, and in those cases, I always go back to the original post and put a note pointing to my new approach about that subject.  In those cases where I sent someone down the wrong path, I apologize.

    -Eric

  • My apologies -- I probably shouldn't have been posting anything that early in the morning, before I had any caffeine.

  • would this be 15% of the 20% then? A hexdump in one line of imperative perl

    I am sure that Linq is a great tool but the conclusion is just not right.

    code:

    while($s = sysread(STDIN, $d, 16)) {printf "%016x " . "%02x " x $s . "\n", $o, unpack("c$s", $d) ; $o += 16}

    or

    command line and shorter:

    perl -ne 'BEGIN { $/ = \16 } printf "%07x0: @{[ unpack q{(H2)*} ]}\n", $. - 1' your_file_here

    Cheers.

Page 1 of 1 (10 items)