Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Wow, lots of good "hmm" moments in the comments to yesterday's post. Keep them coming!
Many of these resonated strongly with me. One in particular was thinking back to the day when I finally internalized the reality that in C/C++, the declaration "int * px;" does not break down "int *" and "px", but rather "int" and "*px".
Perhaps you all find this perfectly straightforward, but it takes me a little longer to figure all this stuff out. (And hence my motto: "Eric Lippert figures it out... eventually".) The conceptual leap that I had to make was this:
When you say "int x;" that means "any time you see x, it refers to a storage location that can contain an int". That's straightforward. When you see "int * px;" the tempting thing to do is to reason "any time you see px, it refers to a storage location that can contain an int*", but that is the wrong way to look at it. The insight is to shift your focus so that the conceptualization follows the same pattern as the "int x;" case. That is "any time you see *px, it refers to a storage location that can contain an int".
Everything then follows from that insight. What can go into px? Anything that you can dereference and get an integer variable out of as a result!
Once you start taking the declaration apart correctly, other confusions drop away. Like, I was always confused by const:
"const int y = 123;" means that y refers to a read-only storage location for an int. What does "const int * py = whatever;" mean? Does it mean that py is a pointer to an integer and the pointer does not change, or that the integer pointed to does not change? When you take apart the declaration correctly it becomes clear. It means "*py refers to an read-only storage location for an int".
The same logic works for arrays. "int rx;" means "rx[something] refers to integer storage". "const int *p;" means "*p[something]" refers to const integer storage", and so on.
This also explains why "int *px, x" gives you a pointer to an integer and an integer, not two pointers to integers. This says "*px and x both refer to integer variables".
Now, perhaps I am not the only person to find this rather confusing, which motivates the completely different way we do it in C#. In C# the type of the variable is on the left hand side of the declaration and the name of the variable is on the right hand side, period. In C# you say "int* px;" and that means "px is a variable of type int*". Types can always be decomposed unambiguously in C#; this is an array of pointers to int.
Of course, it helps that in C# you cannot take a pointer to an array, and array types (being reference types) cannot be made nullable, so the number of possible combinations of type-modifying suffixes is very limited.
OK, I thought of another C# one: finalizers are called if the constructor throws an exception.
I like the way C# does it. I view "int*" as a type, so I would type "int* a, b" and read that as "I'm declaring two variables of type "int*", where the compiler reads it as "he's declaring storage for two ints, one anonymous with a pointer and one onymous with a variable".
The point about "once you get the model right, lots of other things become easy too" is *really* important. This is why I make such a big fuss about distinguishing between "objects are passed by reference" (incorrect) and "references are passed by value" (correct). A lot of the time the person claiming the first actually knows what's going on, but by expressing it imprecisely they're encouraging the model that the value of the parameter *is* an object.
Once you understand that the value of a reference type variable is just a reference, you don't need to distinguish between reference types and value types when it comes to parameter passing: by default, the argument is passed by value; with "ref" the argument is passed by reference. You then need to keep distinguishing between the *value of the parameter* (just a reference) and the object it refers to.
This is one of the biggest stumbling blocks for new developers IMO - once the mental model is right, so many things just make sense automatically.
Rant over... for now.
This works well. In C and C++ if you have:
You'd think that's an array of M int[N]. But it's not. The ith element, a[i], is not int[N].
The type of a[i] is, as you say, something that refers to an int when [j] is attached, where 0 <= j < M. int[M]
Your perspective is unintuitive but works.
I agree with you completely about c++'s absolutely confusing syntax for refering to pointer types. My mental model was always something like "I have an array of pointers to ints", but I'd always want to write it like this:
(int*) x = new (int*);
I like the way C# handles it much better. Although admittedly the fact that the value/reference type decision is a part of the type definition always seemed a bit odd to me coming from a c++ background. But at the end of the day it probably makes sense.
I never quite understood C/C++ declarations until I read Peter van der Linden's treatise on declaration unscrambling in his excellent "Expert C Programming" (Prentice Hall 1994 -- not sure if it's still in print, but a must-read for C/C++ programmers if you can get it).
C/C++ features like that are why I love C#.
Steve Friedl has an excellent explanation of how to read C type declarations:
For "const int * py", you'd basically read the declaration right-to-left, giving you "py is a pointer to an int that is const."
"Of course, it helps that in C# you cannot take a pointer to an array"
It also helps that the CLR/C# doesn't have const type modifiers. You still have an exponential problem, but it now becomes 1^n instead of 2^n. :-)
I do like your point here though: "Once you start taking the declaration apart correctly, other confusions drop away." C++ is a bigger language than C# (although C# is rapidly catching up with versions 3.0 and 4.0), so it's reasonable to expect a bigger learning curve.
It absolutely does make sense to do the declaration syntax the way C# does it. If it's not obvious, then one might want to consider why Java and D did the same (and, in general, all C-style languages that don't have a goal of being C/C++ compatible go that route).
That said, I still much prefer the way Pascal and JS.NET do declarations:
var p : int
and for pointers that would be something like:
var p : *int
this can be read much more naturally as "p is a pointer to integer", in the way it's written. It also makes array-of-array types clearer. Even in C#, we have this backwards. You'd think that, for any given type T, you can just add  to the end to get "array of T", but nope:
int x; // x is an int - ok
int y; // y is an array of int - ok
int[,] z; // z is NOT a 1D array of 2D arrays of int - it's a 2D array of 1D arrays of int!
This is a mess. So type is read right to left, but array ranks are read left to right wherever they occur. As you say, it sure is good that we don't have pointers to arrays, else it'd get real messy real fast! As for the preferrable syntax, here's what actually makes sense:
var x : int;
var y :  of int;
var z :  of [,] of int;
It also makes sense since generic types (other than arrays) are written left-to-write - we don't say "<int>List" in C#, we say "List<int>". Coincidentially, ML/F# allows the former, but at least it's consistent and does it both for arrays and for other types ("int list" etc), and never reverses the order in the middle.
Anyone interested in function pointers?
Peter Van Der Linden's book Deep C Secrets provides a very good way of breaking down a declaration in C/C++ with a very simple to remember algorithm which can be used to break down a complicated C/C++ algorithm.
The C# way of declaring "things" is definitely much more intuitive than in the C/C++ world, imho.