Danger, Will Robinson!

Danger, Will Robinson!

Rate This
  • Comments 47

As long-time readers of this blog know, I am often asked why a particular hunk of bad-smelling code does not produce a compiler warning.

"Why not?" questions are inherently hard to answer because they turn causation on its head; normally we ask what caused a particular thing to happen, not what caused a particular thing to not happen. Therefore, rather than attack that question directly I like to rephrase the question into questions about the proposed feature. (A warning is of course a feature like any other.) By answering these questions we can see whether the compiler team is likely to spend its limited budget on the feature rather than prioritizing something else. Some questions I think about when considering "why did you not implement this warning?" are:

Did someone think of it?

If no one on the compiler team ever thinks of the possibility of a particular warning then obviously it won't happen. As an example of that, I was recently asked why:

using(null) statement;

does not produce a warning. That code is legal; it is equivalent to

IDisposable temp = null;
try
{
  statement;
}
finally
{
  if (temp != null) temp.Dispose();
}

Of course the code is completely pointless; all it does is introduce a try-finally block that adds nothing useful to the program. Why don't we give a warning for this useless code? Well, the primary reason is that (to my knowledge) no one on the compiler team ever thought that a customer would do such a thing, and therefore, never thought to design, implement, test and document a compiler feature to deal with the situation.

Now that we have thought of it, we can see if it meets any of our other criteria, like, is it plausible that a user would type this accidentally thinking that it does something sensible? But I'm getting ahead of myself.

Is the proposed warning even possible? Is it really expensive?

Sometimes people propose warnings that would require solving the Halting Problem (which is unsolvable in general) or solving an NP-hard problem (which is in practice unsolvable in a reasonable amount of time). Proposed warnings like "warn me if there is no possible way for this overload to be chosen" or "warn me if there is an input which guarantees that this recursive method has unbounded recursion" or "warn me if this argument can ever be null in this program" require a level of analysis that is either in principle impossible, or possible but better performed by tools custom made for that purpose, like the Code Contracts engine. Warning scenarios should be cheap for the compiler to disambiguate from non-warning scenarios.

Is the code being warned about both plausible and likely to be wrong? Is the potentially "wrong" code actually sensible in some scenario?

There are an infinite number of ways to write completely crazy code; the point of warnings is not to identify all possible crazy code, but rather to identify code that is plausible but likely to be wrong. We don't want to waste time and effort implementing features warning about ridiculous scenarios that in practice never arise.

Sometimes code seems to be both plausible and wrong, but is intentional for a reason that is not immediately obvious. The best example of this is our suppression of warnings about write-only locals; a local might be read by the developer during debugging even if it is never read in the program.

Machine-generated code is also often crazy-seeming but correct and intentional. It can be hard enough to write code generators that generate legal code; writing code generators that change their output when they determine that they're generating code that could produce warnings is burdensome. (For example, consider the machine-generated code that you get when you create a new project in Visual Studio. It had better compile without warnings!) However, we are much more concerned with catching errors in code written by humans than we are about making machines happy.

Obviously we only want to warn about code that has a high likelihood of actually being wrong. If we warn about correct code then we are encouraging users to change correct code, probably by changing it into incorrect code. A warning should be a bug-preventing feature, not a bug-encouraging feature. This leads me to my next point:

Is there a clear way to rewrite correct code that gets an unwanted warning into correct code that does not get a warning?

Warnings ideally ought to be easily turn-off-able. For example:

bool x = N();
...
if (x == null ) Q();

That gives a warning that == null on a non-nullable value type is always false. Clearly you can rewrite the code to eliminate the warning; if you intended it to always be false then get rid of the whole statement, or you can turn the variable into a nullable bool, or whatever.

A warning where there is no way to write the code so that the warning disappears is very irritating to people who actually do mean the seemingly-wrong thing. You can always turn a warning off with a #pragma of course, but that's a horribly ugly thing to force someone to do.

Will a new warning turn large amounts of existing code into errors?

Lots of people compile with "warnings as errors" turned on. Every time we add a warning that causes a lot of existing code to produce warnings, we potentially break a lot of people. Even if the warnings are good ones, breaking a lot of people is points against doing the feature. Such warnings are perhaps best handled in another tool, which brings me to:

Does the compiler team have to do this work? Or can another tool produce the warning?

Microsoft provides tools like FxCop, StyleCop and Code Contracts to do far more extensive analysis of code than the C# compiler performs. Similarly with third-party tools like ReSharper. If a warning can be implemented by one of those tools just as easily or more easily than it can be implemented in the compiler then that's less work for me, and that's awesome. Also, since FxCop examines compiled state, a warning in FxCop can catch problems in C# and VB, without having to change either compiler.

Will the user discover the error immediately upon testing anyway?

The earlier a bug is discovered, the cheaper it is to fix. The opportunities to find bugs are, in order of exponentially increasing cost of fixing the bug:

* When you type in the code
* When you compile the code
* When you test the code
* When your customer runs the code

Warnings about potential bugs that pop up while you are typing the code or compiling the code are bugs that never make it to the customer. However, a potential warning that points out a bug that would always be caught by testing is less compelling than a warning for a bug that could make it through testing to the customer.

For example, I was recently asked why the compiler does not provide a warning for this common typing error:

private decimal cost;
public decimal Cost { get { return this.Cost; } }

Whoops, that's a stack overflow exception (or, on machines with tail recursion optimizations, an infinite loop) waiting to happen right there. The compiler can in principle determine cheaply that property getter does nothing other than call itself, so why doesn't the compiler warn about it? We could do all the work of identifying code that has this pattern, but why bother warning about a bug that will be caught anyway? The instant you test this code you will immediately discover the problem and it will be clear how to fix it. This is unlike our earlier example of "if (x == null)", where the fact that there is unreachable code might go completely unnoticed during testing, particularly if the common scenario is for x to be false.

Summing up:

Since that is quite the gauntlet for a feature to run, we find that usually the right thing to do for a proposed warning is "don't add the warning". The most likely situation for lots of new warnings to be added is when the warnings are surrounding a new feature that has new ways of being misused; we added quite a few warnings in C# 4.0 for misuses of "dynamic", "no PIA", and "named and optional arguments". Warnings for new features have no existing code using them that would break.

That said, we can always improve; if you have ideas for warnings do keep them coming.

  • @M.Weiser:

    Allowing duplicate values in an enum is a feature, not a bug. There are scenarios where it's useful to allow duplicate values. One such one I ran into even just recently was in declaring a enum types to represent Unicode character classes and scripts. It's nice for the enum to support all the names defined in Unicode, and in Unicode they have multiple names for many of the same values, which then needs to be mirrored in the enum in order for the enum to provide maximum utility.

    I don't see how allowing duplicate values would "totally break a program". Sounds like programmer error to me.

  • Many native enums (for P/Invoke) also have multiple names for the same value.

  • I agree with Jon on the self referential property overflow being non obvious in some cases. If you're using ScriptSharp, the end result runs in the browser, where a stack overflow just disappears and gives no warning. It took me quite some time to finally track the problem down, and I was a little surprised that the C# compiler hadn't warned me about it.

  • @Pete - I still think it should yield a warning if the same value is repeated.  The problem for me is that it is not obvious in the case default values and automatic values are mixed accidently - if you are really sure that you want this, then you may suppress the warning. See my example program for the side effect. One would never assume that it will show Test.A twice.

  • @M. Weiser: "I still think it should yield a warning if the same value is repeated."

    Well, for better or worse, it seems my view is closer to that held by the C# compiler team than yours is. I believe that the lack of a warning in this case is consistent with the general philosophy stated by Eric here and in previous posts. Paraphrasing: only emit a warning if the code is valid but almost certainly incorrect.

    There are too many reasonably scenarios where duplicated enum values are not only a correct design, but also a desirable one. It doesn't even come close to being something that is "almost certainly incorrect". Forcing a programmer to suppress a warning when practically all the time, the duplicated values are in fact intended would be most annoying.

    Perhaps you are still following the C++ compiler's mindset, where at the strictest warning levels tends to warn about all sorts of things that are actually just fine. I've always found that behavior annoying, and one of the reasons my move to C# has been such a pleasure (but hardly the only one :) ) is that the compiler is much more pragmatic about warnings. It is IMHO actually much better than the C++ compiler on both sides, producing fewer false positives _and_ fewer false negatives with respect to warnings. I spend a lot less time jumping through hoops to comply with C#'s warnings than I ever did dealing with C++'s warnings.

    As far as your specific example goes, it is IMHO very bad practice to mix automatic and explicit literal values (as opposed to explicit values that simply equate one enum value to another). You should not need a compiler to tell you not to do that. Just don't do that! Rather than blaming the compiler for catching your mistake, you should just take the lesson learned and use better programming practices to avoid that particular bug. There are just too many valid reasons someone might have duplicate enum values for us to have to deal with a compiler that complains every time we declare an enum that does.

    enum { Foo, Bar, DuhWhee } is fine

    enum { Foo = 0, Bar = 0, DuhWhee = 1 } is fine

    enum { Foo = 1, Bar = 2, DuhWhee = 3 } is fine

    enum { Foo, Bar = Foo, DuhWhee } is fine

    enum { Foo, Bar = 0, DuhWhee } is a) bad form, and b) should be obvious to any practicing programmer that there's a duplicate value in there anyway.

    Now, I suppose you could ask for the compiler to warn only on the last form. But that's yet even more analysis the compiler would have to do, and for something that is still not obviously an error. I would prefer my C# compiler not waste its time dealing with such scenarios.

  • One point about enum repeated values - it wouldn't be hard to write a unit test to go through all the enums in a given assembly and check that they didn't have any repeated values.

  • Add as much static code analysis as possible.  It's value is well proven with the embedded C/C++ tools  dating back to ~1980.  

  • @Ted: "Add as much static code analysis as possible"

    I think this is demonstrably false. False positives train programmers to ignore warnings. It's important that when warnings are emitted, they have an extremely high likelihood of being relevant.

    I agree that static code analysis is a useful and desirable thing. But "as much as possible" without any qualification goes too far. And I agree with Eric that a lot of static analysis simply doesn't belong in the compiler itself, especially with an intermediate-language platform like .NET.

  • Don't want to be disrespectful, but "I was misremembering; serves me right for not running the code through the compiler" is something that a "developer who is so unprofessional as to deploy code without running it" would say :)

  • It's trivially easy to create code that you can't debug. Simply create a custom control with a self-referential property. As soon as you attempt to drag the control into the designer, VS will try to enumerate the property values for the Property box, run into the recursive property, overflow the stack, and immediately crash. You can't debug that crash because it doesn't break into your code (even if you attach a debugger to VS). You can't even run your code to test it because to do that you must first get it onto the design surface, and VS crashes before you can even finish that!

    If you've just created a property and now it crashes, it's not hard to figure out that the change you most recently made caused the problem. If, on the other hand, you created a whole bunch of properties and made many other changes, you would have no reason to believe that a crash in VS is related to a property you wrote 500 lines of code ago.

    Considering how easy it is to run into this problem (just a single missing/extra keystroke) and how hard it is to fix (it appears to be a bug in VS), it seems worth a warning. Alternatively, make it possible to catch a stack overflow so that the designer can point to the property with the bug.

  • @Gabe: "It's trivially easy to create code that you can't debug. Simply create a custom control with a self-referential property. As soon as you attempt to drag the control into the designer, VS will try to enumerate the property values for the Property box, run into the recursive property, overflow the stack, and immediately crash."

    Except that VS shouldn't crash. It's absurd that it does. It should be running the custom control code in a defensive way, protecting the VS process from a crash and simply displaying an error, as it does with other kinds of exceptions.

    This issue doesn't in any way suggest that the compiler needs another warning. It simply points to a flaw in the IDE that needs fixing.

    Again: the compiler should not be the end-all, be-all for detecting and reporting all possible problems in a program. Programming involves a large number of tools, used at various steps along the way, and while there are things that the compiler _could_ catch besides simple "invalid code" errors, is it simply not true that the compiler _should_ catch all those things. And the closer to compilation time that an error would be evident, the _less_ important it is for the compiler to have code in it to catch those errors.

  • pete.d: If it makes you feel any better, I already suggested that one of the options is to make VS catch the stack overflow. That's beside my point, though. I was just responding to Eric's comment that he would have to take Jon Skeet's word that one could write code that couldn't be debugged. My point was only to demonstrate how trivial it is. In fact, not only can it not be debugged, but you can't even test the code because the IDE crashes before you can get your control onto your form.

    Of course you could argue (correctly) that VS shouldn't crash, but apparently you've never attempted to recover from a StackOverflowException. If you had, you would know why VS just crashes.

    By the same token, though, you could argue that the compiler should warn about such problems. Considering the ease with which the situation can be caught by the compiler, the lack of false positives (code which either throws an uncatchable exception *or* goes into an infinite loop is never plausibly correct), the difficulty in debugging or even testing in such situations as I've described, and the frequency of it happening, it seems like an easy decision to me. Sure, the compiler doesn't *need* the warning, but having it would have saved me (and Jon Skeet, apparently) a significant amount of time.

  • @Gabe: "Of course you could argue (correctly) that VS shouldn't crash, but apparently you've never attempted to recover from a StackOverflowException. If you had, you would know why VS just crashes."

    You clearly have not really thought that one through. In particular: just because StackOverflowException is fatal for the AppDomain, that does not mean it's impossible for VS to be implemented in a way such that a StackOverflowException doesn't take down the entire process.

    Rather than making assumptions about what I have and have not done, try digging a little deeper into what is and is not possible. Yes, it's harder for VS to survive than to not survive. But that doesn't mean VS should be given a pass, and it _definitely_ does not mean that the compiler should be the component that takes up the slack.

    The self-referential property bug is just one example of many similar ones, most of which the compiler cannot be reasonably expected to catch. Why bog the compiler down for a single niche scenario, when all the other similar ones will necessarily remain, and when the real problem is not in the compiler, but rather in the other tools? It would be much better if VS were simply more robust in the face of _all_ such scenarios.

    I have plenty of sympathy for those who run into these kinds of bugs and who spend a significant amount of time trying to figure out exactly what is wrong with their program, only to find that it's a silly programmer error. But it's a knee-jerk reaction to think that just because the compiler _could_ be written to protect against such bugs that it _should_. Once one gets over the initial embarrassment of having made the mistake and the frustration of tracking the mistake down, a more rational view should develop and one should acknowledge that the compiler has a very specific purpose, and that its primary job is _not_ to identify those mistakes that will become apparent the moment the code is executed.

  • FYI, the VC++ compiler team thought that the self-recursive case was worth a warning:

    C4717 'function' : recursive on all control paths, function will cause runtime stack overflow

  • "VC++ compiler team thought that the self-recursive case was worth a warning:"

    Of course they did. That's one of the reasons I dislike the C++ compiler so much. :p And even compiling without optimizations, the C++ compiler takes forever compared to compiling similar code in C#. I can't help wonder how much of that time is spent doing analysis for those kinds of warnings.

    I have no first-hand knowledge of any C++ compiler internals, but I strongly doubt that it is the warnings causing a perceived performance issue. But more generally, you are comparing apples with oranges when you compare a C# compiler to a typical "native" C++ compiler for performance. They have very different tasks to perform. A C# program must generate complete metadata and IL for the entire program; a C++ compiler must generate object code for each individual source file and then link them later. Those two tasks have potentially very different performance characteristics. C# relies upon a JIT compiler doing all the heavy lifting of translating intermediate code to optimized machine code; most C++ compilers do not have the luxury of being able to skip the optimizations. C++ compilers are written to assume that each object file can be compiled independently by re-parsing the headers, and the link step that resolves inter-file dependencies happens later; C# was designed so that all the "top level" metadata is generated *first* and then each method body compiled can make efficient use of that in-memory metadata metadata. That these compilation models are fundamentally different seems to me to be a much more likely cause of performance differences than trivialities like what warnings are detected. -- Eric

Page 2 of 4 (47 items) 1234