Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
There were a number of interesting comments to yesterday's posting about premature optimization.
Several readers pointed out that there are various options we could take:
These all have various pros and cons, and even for a relatively trivial spec incompatibility such as this one, we think very hard about what the right thing to do is.
Option #2 sounded good, and I actually was an advocate of it for a few hours yesterday, right up until I dug into the optimization and discovered exactly what it was doing. If we were to change the specification to exactly match the behaviour, then the specification would read like this:
Implicit Enumeration Conversions
There is an implicit conversion from any "magic zero" to any enumerated type. "Magic zero" is recursively defined as follows:
This means for example, that (8*(0+(7-7))) is a magic zero, but (8*((7-7)+0) is not.
I think we can all agree that we don't want this nonsense in the specification. Changing the spec to exactly match the buggy behaviour leads to a crazy spec.
We could change both the specification and the implementation. We could, for instance, say that any compile-time constant integer zero is implicitly convertible to any enumerated type. This means that nonsense like (8*(0+(7-7))) is assignable to enum, but at least it also means that (8*((7-7)+0) is too.
Adding a new warning is a tempting idea, and we might end up doing it. Doing so means that it is easier to turn it into an error in a future release. New warnings, however, are expensive. They add new code paths to test, we have to test all the scenarios to make sure the warning always appears when it should, warnings have to be translated into a dozen languages, they have to be documented, the documentation has to be translated... it's not cheap.
Adding a compiler switch is also a possibility, but also expensive. The more switches there are, again, more code paths, more tests, bigger testing matrix, the switch has to be documented, and so on.
So now you see why premature optimization is the root of all evil. I've spent the better part of two days wrestling with this thing. Why? Because suppose one of these guys is in a lambda expression that needs to be translated into an expression tree. I want to make for darn sure that the translation is correct, and the more "optimizations" that happen before that transformation, the more the transformation is likely to be not as the user intended! But I can't remove the optimizations without breaking the type system!
UPDATE: The thrilling conclusion is that we have made all constant-according-to-the-spec zero integers convertible to any enum. This is a spec violation, but at least it is a consistent spec violation. In doing so I also introduced a bug, which I believe did ship to customers but which we have since fixed, whereby some "non-integer zeros" (like default instances of structs, or null references) were convertible to any enum. I apologize for the error.
I also challenged you folks to come up with a situation where this premature optimization can screw up definite assignment checking. Reader Carlos found two.
int aa; int bb = 12; int cc; int dd = (aa * 0) + 12;if (bb * 0 == 0) cc = 123;Console.WriteLine(cc);
According to the spec, aa should be flagged as "used before definitely assigned" (even though really it doesn't matter), but because the premature optimization throws away the aa, the flow checker never sees it. Also according to the spec, cc should be flagged as "used before definitely assigned" (even though it always is assigned), but because the optimization throws away the bb, the flow checker does not realize that the conditional is not a compile-time-constant.
The important thing here is not that the code gets it wrong. Indeed, one could make the argument that it gets it right. The problem is that it gets it right for the wrong reasons – it gets it right by accident, in violation of the specification. The definite assignment specification isn't supposed to be perfect, it's supposed to be practical and understandable and predictable and implementable. Undocumented extensions to it make it hard to know whether any two implementations of the spec will agree.
You never answered Kurtiss' question. I thought the same thing while I was reading the post; doesn't that actually make sense?
The choice we eventually went with was to violate the specification in a sensible manner. All constant zeroes now go to any enum.
This is a breaking change, since overload resolution on void M(int x) {} vs void M(MyEnum x) {} is now ambiguous on M(ConstantZero) when it did not used to be. This breaking change has caused at least one customer's code to break, which caused them to complain bitterly to Microsoft because we made them lose money. (And who knows how many people are broken who haven't complained?)
Changing the rule to "any integer constant goes to any enum" would be a MASSIVE breaking change that would affect thousands of customers, so we will not do that.
If the question is why the rule is like this in the first place -- well, the fact that enumerated types are just fancy integers is in my opinion deeply unfortunate. It's a leaky abstraction. I would rather have seen two kinds of types -- enumerated types, which are strictly restricted to their set of values and have no relationship whatsoever to integers, and flags, which are fancy dress for integers with some added type safety. Clearly the only reason to make zero go to any enum is so that a flag enum can always be initialized to empty -- there's no reason why zero ought to go to a non-flag enum.
That makes sense. I wasn't clear on what the final resolution was from the post itself; thanks for clearing that up.
And I absolutely agree with you about enumerated types. In a static type system it makes no sense that I have to check whether a variable of a particular type is actually belongs in that type! "Deeply unfortunate" is an understatement. Unfortunately I suppose there's no going back now.
It seems that something wrong with the parenthesis in the (8*((7-7)+0) expression.