The Root Of All Evil, Part One

The Root Of All Evil, Part One

Rate This
  • Comments 27

People often quote Knuth's famous statement that premature optimization is the root of all evil. Boy, has that ever been the theme of my life these last few weeks as I've been banging my head against the compiler trying to figure out how we're going to make it work with LINQ without breaking backwards compatibility. Some seemingly harmless premature optimizations in the compiler are eating up precious time dealing with how to work around them.

Before I can explain what exactly the premature optimization is that's drivin' me nuts, a short quiz. Suppose you've got an enumerated type E with value X.

E e1 = E.X;
E e2 = 0 | E.X;
E e3 = E.X | 0;
E e4 = E.X | 0 | 0;
E e5 = 0 | E.X | 0;
E e6 = 0 | 0 | E.X;

All of these are legal according to the compiler. One of these is illegal according to the specification. Before you read on, take a guess as to which one, and why.

Now, producing sensible behaviour when we should be producing an error is the least bad of all possible spec violations, but still, it's irksome. It means that we have to ensure that we never comply with the spec, because if we suddenly turn this into an error, we might break an existing program.

Which of the above should be illegal? The first one is obviously good. The second and third are legal because section 13.1.3 says that literal zero is implicitly convertible to any enum, and section 14.10.2 says that you can binary-or two enumerated type values. We'll therefore convert the zero to the enumerated type and use the enum binary-or operator.

Since binary-or groups on the left, the fourth and fifth are legal. The left-hand part of the expression is evaluated first, and the right-hand zero is converted to the enumerated type.

The sixth case is illegal according to the spec, because this is the same as (0|0)|E.X(0|0) is not a literal zero, it's a compile-time-constant expression that happens to equal zero, and there's no implicit conversion from that thing to any enumerated type!

And there's the premature optimization. After we've got a complete parse tree we walk through the parse tree making sure that all the types work out. Unfortunately the initial type binding pass bizarrely enough does arithmetic optimizations. It detects the 0|something and aggressively replaces it with just something , so as far as the compiler is concerned, the sixth case is the same as the second, which is legal. Argh!

This kind of optimization is way, way premature. It's the kind of thing you want to do very late in the game, possibly even during the code generation stage. Someone tried to do too much in one pass – the type annotation stage is no place for arithmetic optimizations, particularly when they screw up your type analysis in subtle ways like this!

Next time I'll talk about how these same premature optimizations screw up definite assignment checking. There is a much related situation where there is a variable which the compiler treats as definitely assigned, and it IS definitely assigned, but the definite assignment specification requires us to produce an error. Bonus points to anyone who can suss it out from that vague description!

  • When I try to create unions using explicit struct layout, (for example, to union a double and a long) the compiler complains that I need to provide assignments to both the double and long.

    The following compilers and runs without producing an exception, because of aggressive optimization. The second line doesn't produce an error because x is a dead store.

    string s = null;
    int x = s.Length;

    My guess for the situation where the compiler treats a variable as definitely assigned against the spec is the following.

    string s ;
    if (true)
       s = null;
    int x = s.Length;

  • Personally, I think you should change the spec to match the existing behavior here. Constant folding should have absolutely no bearing on whether the program matches the spec or not, and if the spec says it does, the spec should be changed.
  • The simplest example I could think of:

    int a;
    int b = (a * 0) + 1; // Should be an error: (a * 0) is not a constant expression and a is not definitely assigned

    P.S. Welcome back!
  • Wes: send me a repro of your bug, I'll have a look at it.

    Anthony:  The spec specifies precisely what constant foldings must be performed so that folded constants can be used for switch labels, constant fields, etc.  So it's a nonsequiter to say that "constant folding should have no bearing on whether the program matches the spec".  The spec is all about constant folding.

    The problem with changing the spec to match the behaviour is that the existing behaviour is a complicated and weird interaction between an optimizer (which is NOT specified by the specification) and the constant folding code (which is specified.)  For example, (0-0) | E.X  also fails to produce an error, but (7-7)|E.X does correctly produce an error.  Basically we have a few choices:

    1) Do nothing -- document the spec violations internally and ensure that we maintain them
    2) Rewrite the spec so that it is a description of the current complex and bizarre behaviour
    3) Extend the spec even further, so that, say, any integer constant could be implicitly converted to enum.  

    Option (2) is gross, option (3) would make unfortunate expressions like (6 + 8 - 2 * 7) | E.X legal, so (1) is what we're stuck with.  

    Hence, root of all evil.  We wouldn't have this evil choice to make if someone had made the optimization pass the LAST thing that runs rather than the FIRST.

  • Carlos: This is in fact a bug that you've found, so, good work.  However, it's not the bug I asked for.  I asked for a bug where the compiler treats a var as definitely assigned, it IS definitely assigned, but the definite assignment spec says that it should be an error.

    Your bug is one where the compiler treats a var as definitely assigned, it is NOT definitately assigned, and it should be an error.

    But thanks for pointing this out, because this means I'll need to ensure that this case is maintained as well!

  • Wes: the compiler is smart enough to treat if(true) as something which always executes the consequence, so s will be definitely assigned and this is not in violation of the spec.
  • How about:

    int y = 2;
    int x;
    if ((y * 0) == 0)
     x = 1;
    int z = x + 1; // Should be an error: (y * 0) is not a constant expression so the compiler shouldn't consider the "if" as always taken.
  • Ding!  Give that man a cigar!

  • Stupid question, but...why not break non-compliant programs? They're non-compliant. I admit that it's cold comfort to the broken people when you tell them that they were wrong, but otherwise it rather degrades the relevance of a spec to say that, well, we're never going to comply with it.
  • Look at it from the point of view of the customer.  You have a codebase of hundreds of thousands of lines of working code.  Microsoft comes along and says "hey, we've got a new compiler that has new language features that will enable you to do even better work."  

    New language features raise productivity, and thereby increase the profitability of companies which provide .NET solutions.  Those companies are better able to provide software to their customers, which encourages those customers to see the .NET platform as attractive, which means that they buy Windows, which makes me happy as a MSFT shareholder.

    Now we come along and say "but there's a catch.  We were bozos and allowed you to write a perfectly sensible, straightforward program which for obscure reasons just happened to be noncompliant with the holy specification in this one little way.  If you try to compile your old code against our new compiler, it might break.  Which means that you have a choice of either (a) don't use the new compiler, and forego all the new productivity benefits, or (b) take your developers off of whatever they're doing now so that they can insert the right parentheses in the right places to bring the code up to compliance with the specification.  

    Both of those are costs to the ISV, costs which are associated with ZERO increase in code quality.  It makes upgrading less attractive, it makes people less productive, blah blah blah.

    Remember, the Holy Specification is a means to an end, not an end in itself.  It's there so that the language can be understood by the people who are using it.  But "stable from version to version" is for most real-world applications, about a million times more important than "word-for-word compliant to the spec".  Would we like to be totally compliant to the spec, all other things being equal?  Of course.  But given that we've screwed up in the past, it is way more important to keep those mistakes stable and documented and predicatble than to get people to upgrade and discover that they have to "fix" previously working code.  

  • Incidentally, the way that I discovered this issue today was that I moved the premature optimization to run after constant folding, not before.  When I went to recompile the .NET Base Class Library with the new compiler, sure enough, somewhere in there someone had 0 | 0 | E.X, and it broke right away.  This is in real-world production code, and if I were to check in a compiler that broke the BCL build, I would be in so much trouble it would make your head spin. So many _thousands_ of people depend on being able to make a fresh build of the BCL every day that we have to show that there is a HUGE benefit that outweighs the massive cost of make breaking changes. Merely coming into line with the spec is nowhere near important enough to justify stopping thousands of people from being able to compile for a day.
  • In the past VS has always seemed to flag discovered violations as warnings for a version, then promote them to errors, giving at least some lead time. Heck, VS2005 broke badly a number of apps that relied too much on VS6 and VS2003's bad scoping rules. (Alternately, a special subparser designed to scan imported code for specific patterns like this could correct them, since you'll be documenting them anyway. It won't work for everything, but it's something. I know you already have a lot of code in upgrading project files, but as far as I know, not for actual code.)
  • Very simple:

    a) Some validation thing, possibly upon first import of "legacy" code will detect situations where there is possibility for a break to occur and warn about them

    b) a compiler switch could be used or a define added that specifies the compiler to use non-conforming, legacy behaviour on these kind of special cases. A validation/import tool could then automatically add this to old projects/files.
  • Indeed, one of the things we are researching for the next release of the compiler is what breaking changes are appropriate for this kind of "switched compiler" treatment.   Again, though, just having a switch is not a magic panacea.  Additional switches greatly complicate the test matrix, which means that we have less confidence that we're giving you a quality product.

    Adding warnings is a good technique.  I'll soon be talking in this space about some of the warnings I've added to the compiler in the last few weeks.
  • "IS definitely assigned, but the definite assignment specification requires us to produce an error."
    E e6 = 0 | 0 | E.X;
Page 1 of 2 (27 items) 12