Arrays of arrays

Arrays of arrays

Rate This
  • Comments 48

Most people understand that there’s a difference between a “rectangular” and a “ragged” two-dimensional array.

int[,] rectangle = {
  {10, 20},
  {30, 40},
  {50, 60} };
int[][] ragged = {
  new[] {10},
  new[] {20, 30},
  new[] {40, 50, 60} };

Here we have a two-dimensional array with six elements, arranged in three rows of two elements each. And we have a one-dimensional array with three elements, where each element is itself a one-dimensional array with one, two or three elements.

That’s all very straightforward. Where things get brain-destroying is when you try to make a ragged array of two-dimensional arrays.  Quick, don’t think, just answer.

int[,][] crazy;

What is the type of crazy?

Option one: It’s a one-dimensional array, each element is a two-dimensional array of ints.
Option two: It’s a two-dimensional array, each element is a one-dimensional array of ints.

OK, now that you have your snap answer, think about it carefully. Does your answer change?

I’m not going to tell you the answer just yet. Instead let’s explore the consequences of each possibility.

Consequence One

Surely the way you make any type into an array is to append [] to the type specification, right? But in option two, you stick the [,] into the middle.

Option two is weird. Option one is sensible. 

But wait. If [,][] means "a 1-d of 2-d's", then the order you read it off the page opposes the order you say it -- it looks like two-d-thing-followed-by-one-d-thing, so why shouldn't it read "a 2-d of 1-d's"?

But then why does the "int" come first? By that logic it should come last.

Argh. Maybe option one isn't entirely sensible. Clearly something is not quite perfect with both options. Oh well. Let's move on.

Consequence Two

Now suppose that you wanted to obtain a value in that array, assuming that it had been initialized correctly to have plenty of elements everywhere we need them. How would you do it?

Suppose we’re in option one. It’s a one-d array. Therefore crazy[10] is a two-d array. Therefore crazy[10][1, 2] is an int.

Suppose we’re in option two. It’s is a two-d array. Therefore crazy[1,2] is a one-d array. Therefore crazy[1,2][10] is an int.

Option one is weird -- crazy is of type int[,][] but you dereference it as [10][1,2]! Whereas option two is sensible; the dereferencing syntax exactly matches the ordering of the type name syntax.

Consequence Three

Now suppose you want to initialize the “outer” array but are going to fill in the “ragged” interior arrays later. You’ll just keep them set to null at first. What’s the appropriate syntax to initialize the outer array?

Suppose we’re in option one. It’s a one-d array. Therefore it should be initialized as crazy = new int[,][20]; right?

Suppose we’re in option two. It’s a two-d array. Therefore it should be initialized as crazy = new int[][4,5]; right?

Option two is weird. We said int[,][] but initialized it as [][4,5]. Option one is sensible.

What C# actually does

It’s a mess. No matter which option we choose, something ends up not matching our intuition. Here’s what we actually chose in C#.

First off: option two is correct. We force you to live with the weirdness entailed by Consequence One; you do not actually make an element type into an array of that type by appending an array specifier. You make it into an array type by prepending the specifier to the list of existing array specifiers. Crazy but true.

This prevents you from having to live with any weirdness from Consequence Two; in this option, the dereferencing happens with the same lexical structure as the declaration.

What about Consequence Three? This one is the real mind-bender. Neither choice I offered you is correct; apparently I’m a sneaky guy. The correct way to initialize such an array in C# is:

crazy = new int[4,5][];

This is very surprising to people!

The design principle here is that users expect the lexical structure of the brackets and commas to be consistent across declaration, initialization and dereferencing. Option two is the best way to ensure that if declaration has the shape [,][] then the initialization also has that shape, and so does the dereferencing.

That all said, multidimensional ragged arrays are almost certainly a bad code smell. Hopefully you will never, ever have to use your new knowledge of these rules in a production environment.

Life is much better if you can instead use generic collections. It is completely clear what List<int[,]> means – that’s a list of two-dimensional arrays. Whereas List<int>[,] means a two-d array of lists of int.

  • Talking about driving crazy,I love playing with arrays:

           int[,][][][,,] arr = new int[10,12][][][,,];

           arr[3, 5] = new int[6][][,,];

           arr[3, 5][1] = new int[3][,,];

           arr[3, 5][1][0] = new int[3, 2, 4];

           arr[3, 5][1][0][0, 1,0] = 2;

           System.Console.WriteLine(arr[3, 5][1][0][0, 1,0]);

    :)

    But on second though, thank god for generics.

  • > crazy = new int[4,5][];

    > This is very surprising to people!

    Funny; that was my first guess. :)

    That means, you did your designing job right [again]. :)

  • My C++ brain was horrified at what option 2 would mean for code like this:

    #define someType int[,]

    someType[] crazy;

    But I suppose that's part of the reason C# won't let you use #define like that.

  • My C++ brain is horrified at the thought that someone would even consider using #define (rather than typedef) for type aliases!

  • This would not pass our code reviews since arrays beyond 2 dimensions or jagged arrays should be used only in very special circumstances (e.g., representing 3d data or a series of frames for 2d data).

    Eric,

    An interesting exmple of non-uniform array dimensions in how SQL Server 2000 stored variable length rows in a single data block.  Throw in inter-page data compression with SQL 2008 and it gets more interesting.

  • @Greg,

    I am somewhat suprised at your "policy" in this matter.

    Of course there is always the design decision between an array and an indexable (or keyed) collection. When using collections, it is not uncommon to have significantly more than 3 dimensions, and for multiple levels to be sparse (ie not all keys are valid, or indices have different maxiumums based on context.

    If one is to allow those conditions, then why (especially if bounds checking is in effect) the difference for arrays???

  • Perhaps something to consider is that if you're getting to the stage of defining variables like this that you aren't "separating your concerns". Either you want an array of two-dimensional arrays or it is a two-dimensional array of arrays. Whichever one is correct, define yourself a class that represents the second part of the description. Then define a class defining the whole type. You now have a class representing your complicated object, allowing you to use things like indexers and composite patterns. There's no confusion, and (wow!) you're actually taking advantage of using an object-oriented language.

  • Sometimes I wonder why do you guys arm developers with guns. For instance, I don't see why fields can be public and properties can be private.

  • @Marius <wave>

    There are good reasons. Private properties allow a class to internally make use of things like assignment validation and lazy evaluation. And there are times where direct access to fields is required.

    Besides, playing with guns can be exciting and fun <wink>

  • This nonsense is exactly why those of us who grew up on languages other than C cringe at Ritchie's array notation.  Every other language of C's age uses something like "int A(1,2,3)" and treats multi-dimensional arrays as first class citizens.  Even FORTRAN got it right!

  • @Ross,

    I "grew up" on languages that well predate "C" [I started programming in 1972], and actually found "C"'s approach to arrays to be quite good....

    Just goes to show that generalizations always have exceptions [even this one...circuit overload....]

  • @Ross, it's not quite true. Treating a multidimensional array as array-of-arrays is perfectly reasonable, and in fact both Wirth's Pascal and Modula-2 did it that way. I.e. they let you write:

      var a : array [1..2, 1..3] of integer;

    but it was really just syntactic sugar for:

     var a: array [1..2] of array [1..3] of integer;

    and, correspondingly, array access:

     a[1, 2] := 123;

    was syntactic sugar for:

     a[1][2] := 123;

    C simply disposed with the sugar. The real problem isn't the notion of array-of-arrays; it's C's inverted type modifier order.

    BASIC and Ada had true multidimensional arrays, however.

  • I only use 1 dimesnional arrays. Even the language I wrote only allows 1 dimensional arrays

  • @Dylan, thqat may be fine for a specializxed language, but how would you handle time corellated dimensional mapping. Basically a 4 dimensional array that you can slice along any access (or intersection plane) to get a 3 dimensional array which you can slice...2 dimensional ...slice...1 dimensional..slive value...

  • "brain-destroying"  --- I Couldn't have picked a better word :)

Page 2 of 4 (48 items) 1234