Arrays of arrays

Arrays of arrays

Rate This
  • Comments 48

Most people understand that there’s a difference between a “rectangular” and a “ragged” two-dimensional array.

int[,] rectangle = {
  {10, 20},
  {30, 40},
  {50, 60} };
int[][] ragged = {
  new[] {10},
  new[] {20, 30},
  new[] {40, 50, 60} };

Here we have a two-dimensional array with six elements, arranged in three rows of two elements each. And we have a one-dimensional array with three elements, where each element is itself a one-dimensional array with one, two or three elements.

That’s all very straightforward. Where things get brain-destroying is when you try to make a ragged array of two-dimensional arrays.  Quick, don’t think, just answer.

int[,][] crazy;

What is the type of crazy?

Option one: It’s a one-dimensional array, each element is a two-dimensional array of ints.
Option two: It’s a two-dimensional array, each element is a one-dimensional array of ints.

OK, now that you have your snap answer, think about it carefully. Does your answer change?

I’m not going to tell you the answer just yet. Instead let’s explore the consequences of each possibility.

Consequence One

Surely the way you make any type into an array is to append [] to the type specification, right? But in option two, you stick the [,] into the middle.

Option two is weird. Option one is sensible. 

But wait. If [,][] means "a 1-d of 2-d's", then the order you read it off the page opposes the order you say it -- it looks like two-d-thing-followed-by-one-d-thing, so why shouldn't it read "a 2-d of 1-d's"?

But then why does the "int" come first? By that logic it should come last.

Argh. Maybe option one isn't entirely sensible. Clearly something is not quite perfect with both options. Oh well. Let's move on.

Consequence Two

Now suppose that you wanted to obtain a value in that array, assuming that it had been initialized correctly to have plenty of elements everywhere we need them. How would you do it?

Suppose we’re in option one. It’s a one-d array. Therefore crazy[10] is a two-d array. Therefore crazy[10][1, 2] is an int.

Suppose we’re in option two. It’s is a two-d array. Therefore crazy[1,2] is a one-d array. Therefore crazy[1,2][10] is an int.

Option one is weird -- crazy is of type int[,][] but you dereference it as [10][1,2]! Whereas option two is sensible; the dereferencing syntax exactly matches the ordering of the type name syntax.

Consequence Three

Now suppose you want to initialize the “outer” array but are going to fill in the “ragged” interior arrays later. You’ll just keep them set to null at first. What’s the appropriate syntax to initialize the outer array?

Suppose we’re in option one. It’s a one-d array. Therefore it should be initialized as crazy = new int[,][20]; right?

Suppose we’re in option two. It’s a two-d array. Therefore it should be initialized as crazy = new int[][4,5]; right?

Option two is weird. We said int[,][] but initialized it as [][4,5]. Option one is sensible.

What C# actually does

It’s a mess. No matter which option we choose, something ends up not matching our intuition. Here’s what we actually chose in C#.

First off: option two is correct. We force you to live with the weirdness entailed by Consequence One; you do not actually make an element type into an array of that type by appending an array specifier. You make it into an array type by prepending the specifier to the list of existing array specifiers. Crazy but true.

This prevents you from having to live with any weirdness from Consequence Two; in this option, the dereferencing happens with the same lexical structure as the declaration.

What about Consequence Three? This one is the real mind-bender. Neither choice I offered you is correct; apparently I’m a sneaky guy. The correct way to initialize such an array in C# is:

crazy = new int[4,5][];

This is very surprising to people!

The design principle here is that users expect the lexical structure of the brackets and commas to be consistent across declaration, initialization and dereferencing. Option two is the best way to ensure that if declaration has the shape [,][] then the initialization also has that shape, and so does the dereferencing.

That all said, multidimensional ragged arrays are almost certainly a bad code smell. Hopefully you will never, ever have to use your new knowledge of these rules in a production environment.

Life is much better if you can instead use generic collections. It is completely clear what List<int[,]> means – that’s a list of two-dimensional arrays. Whereas List<int>[,] means a two-d array of lists of int.

  • I second Alex O. - there is nothing special about this and no surprises.

    Option 2 seemed to me very logical both to define and to use - so at least from my point of view this was a good design decision.

    Of course I'm not saying it's a good practice to use this style of coding - IMHO generics are the way to go ;)

  • I chose option 2 because I read from left to right...

    int[,][] crazy;

    To me this says:

    this is integer data

    this is a two dimensional array

    each box of that array contains a one dimensional array

    it's called crazy

    simple.

  • 1-D

    //Array of 10

    uint social_security[10];

    uint area_code[10];

    uint zip_code[10];

    //Access item i

    social_security[i];

    area_code[i];

    zip_code[i];

    deck[];

  • @Pavel: I had some contact with the .NET performance team due to some speed issue, and it turnet out that 2D Array access was one of several problems accumulating there. Besides the workaround of using jagged arrays, I was told that .NET 4.0 Beta 2 likely will have some improvements in the area of 2D arrays.

  • Shouldn't RAGGED be JAGGED array?

    Either term is acceptable. "Jagged" appears to be the more common term in use on the Internet; I was taught "ragged". -- Eric

  • @Markus

    Hooray! Maybe they'll add mutli-dimensional array serialization too??

  • Playing with arrays for "C" programmers has always being a challenging task especially when it comes to working along with pointers to such arrays. Remembering the *() notation in pure C, has always helped to understand array better, although at the first look, it meant a bit erratic.  *(array_pointer + offset) = 10;

    gives a better understanding of array pointers. But when it comes to .... int **two_dim_pointer;

    its not that easy.

    Applying the same logic, here... int [ , ] [] crazy =  new int [10, 10] [10] should give something like following

    crazy[0,0][0] = 0;

    crazy[0,0][1] = 0;

    Why not trying int three_d_array[] [] []  = new int [10] [10] [10];

    Anyways, good blog... Eric, this will definitely help understand C# from the core-data structure point of view and help appreciate the generic containers. Also, a great help for using C# micro-framework.

  • It works out better to use a decently designed data structure to store data instead of 3 dimensional or jagged arrays.

    3 dimensional or jagged arrays are a red flag in our code reviews as they indicate a high likelyhood of poorly designed, over designed, non-functional or hard to support code.

  • The design principle does not hold true for me. I did not expect the order to be consistent, I expected the grammar of the language to be consistent. I expected an int[][] to be initialised as "crazy = new int[][10]" and I expected this 10 to refer to the first index in a derefence (i.e. "a" in "crazy[a][b]"). Maybe this is because I think about these things like a parser - I want to deconstruct things into atomic, fundamental pieces. I expected there to be a simple grammar rule that says that array initialisations are of the form "'new' <element-type> '[' <integer-expression> ']'", no matter what the element-type is; but C# complicates it because if that element-type is itself an array type, then this simple rule has a complex exception. This trips me up every time and I cannot easily get used to it.

    Naturally, because I think this way, I am initially inclined to say that the design principle was a bad idea. To me, a simple/consistent grammar would have been a more desirable design goal. However, realistically I don't know the proportion of people to whom the design principle actually applies. Some of the commentors on this entry seem to imply that C#'s solution is in line with their expectation. Have any studies been conducted to determine how many programmers expect the order of the brackets to be consistent across the three contexts, at the expense of having a more complex/inconsistent grammar?

  • For me personally, the "inconsistent" rules would have been preferable. Somehow those are always the rules I think of when thinking of a particular situation (e.g., creating a new array). Possibly because they are actually more consistent with a concept I find important - that "int[]" is an inseparable definition of the type "array of ints". It's a pity that this is not true.

    When I don't remember a rule exactly I try to guess what it might be. So, wrongly assuming "int[]" is inseparable I always guess that you create a new array of 25 of these by appending "[25]". It could well be that others guess that the order would be the same as in indexing and type "int[25][]", but to me this feels completely alien.

    Like Timwi said, it would be interesting to see if you decided the consistency was important among potential C# users by running a study / poll of some sort.

  • Both the int[,][] and List<int>[,] seem very natural to me; I can't see or understand any conflict.

    Even List<int[,]>[,][] is kinda obvious. It's two dimensional array, each element of the array is one dimensional array of List<int[,]>. Each elemnt of the list is rectanguar array of integers.

  • It all seemed so easy in Algol68:

    ref [,][] int crazy; // would give you the reference variable which you could then initialise with loc or heap generators, or just:

    [3,2] ref [] int crazy; // gives you a 2-d array instance into which:

    crazy := ( ( loc [5] int := (1,2,3,4,5), loc [2] int := (1,2) ), (  ,  ),  ( , ) .... etc... );

    When c# allows you to perform arbitrary slices of multi-dimensional arrays, then I might actually applaud, ie:

    crazy[1:2,2]

    crazy[1,1][3:5]

  • ...D=  My C++ brain is not accustomed to this...I'll leave C# to C# people. Bye now.

  • I'm with Timwi and Roman, I don't even find consequence 2 weird for option one.

    It's kinda like accessing members/methods from a class.

  • I'm with Timwi, Roman and Dominique - I find that design decision too "special casy", so to speak. It can never change though, so we'll have to live with it.

Page 3 of 4 (48 items) 1234