Absence of evidence is not evidence of absence

Absence of evidence is not evidence of absence

Rate This
  • Comments 15

Today, two more subtly incorrect myths about C#.

As you probably know, C# requires all local variables to be explicitly assigned before they are read, but assumes that all class instance field variables are initially assigned to default values. An explanation of why that is that I sometimes hear is "the compiler can easily prove that a local variable is not assigned, but it is much harder to prove that an instance field is not assigned. And since the class's default constructor automatically assigns all instance fields to default values, you don't need to do the analysis for fields."

Both statements are subtly incorrect.

The first statement is incorrect because the compiler in fact cannot and does not prove that a local variable is not assigned. Proving that is (1) impossible, and (2) does not give us any useful information we can act upon. It's impossible because proving that a given variable is assigned a value is equivalent to solving the Halting Problem:

int x;
if (/*condition requiring solution of the halting problem here*/) x = 10;
print(x);

If what we wanted to do was prove that x was unassigned then we would have to at compile time prove that the condition was false. Our compiler is not that sophisticated!

But the deeper point here is that we're not interested in proving for certain that x is unassigned. We're interested in proving for certain that x is assigned! If we can prove that for certain, then x is "definitely assigned". If we cannot prove that for certain then x is "not definitely assigned". We're only interested in "definitely unassigned" insofar as "definitely unassigned" is a stronger version of "not definitely assigned". If x is read from when it is "not definitely assigned", that's a bug.

That is, we're attempting to prove that x is assigned, and our failure to prove that at every point where it is read is what motivates the error. That failure could be because of a bona fide bug in your program, or it could be because our flow analyzer is extremely conservative. For example:

int x, y = 0;
if (0 * y == 0) x = 10;
print(x);

You and I know that x is definitely assigned, but in C# 3 the compiler is deliberately not smart enough to prove that. (Interestingly enough, it was smart enough in C# 2. I broke that to bring the compiler into line with the spec; being smarter but in violation of the spec is not necessarily a good thing.) 

This example again shows that we do not prove that x is unassigned; if we did prove that, then clearly our prover would contain an error, since you and I both know that x is definitely assigned. Rather, we fail to prove that x is assigned.

This is an interesting twist on the believers vs skeptics argument that goes like this: the skeptic says "there's no reliable evidence that bigfoot exists, therefore, bigfoot does not exist". The believer says "absence of reliable evidence is not itself evidence of absence; and yes, bigfoot does exist". In both cases, reasoning from a position of lacking reliable evidence is seldom good reasoning! But in our case, it is precisely because we lack reliable evidence that we are coming to the conclusion that we do not know enough to allow you to read from x.

(The relevant principle for tentatively concluding that bigfoot is mythical based on a lack of reliable evidence is "extraordinary claims require extraordinary evidence". It is reasonable to assume that an extraordinary claim is false until reliable evidence is produced. When overwhelmingly reliable evidence is produced of an extraordinary claim -- say, the extraordinary claim that time itself slows down when you move faster -- then it makes sense to believe the extraordinary claim. Overwhelming evidence has been provided for the theory of relativity, but not for the theory of bigfoot.)

The second myth is that the default constructor of a class initializes the fields to their default values. This can be shown to be false by several arguments.

First, a class need not have a default constructor, and yet its fields are always observed to be initially assigned. If there is no default constructor, then something else must be initializing the fields.

Second, even if a class does have a default constructor, there's no guarantee that it will be called. Some other constructor could be called.

Third, the field initializers of a class run before any constructor body runs, therefore it cannot be the constructor body that does the initialization; that would be wiping out the results of the field initializers.

Fourth, constructors can call other constructors; if each of those constructors was initializing the fields to zero, then that would be wasteful; we'd be unneccessarily re-initializing already-wiped-out fields.

What actually happens is that the CLI memory allocator guarantees that the memory allocated for a given class instance will be initialized to all zeros before the constructor is called. By the time the constructors run the object is already freshly zeroed out and ready to go.

 

  • Fifth, local variables are also always initialized by the CLI to default values, so nothing in particular follows from the statement "since the [something] automatically assigns all instance fields to default values".

    I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit. -- Eric

  • @Random832

    I've always wondered: when you write .locals init in MSIL, CLI guarantees the locals to have been initialized upon method entry, so why does C# still mandate users to definitely assign a value to the locals? More often than not, you might want to omit writing the initialization because the default values are exactly what you want.

  • @raven

    I seem to recall reading that this was a decision made because the use of an uninitialized local variable is most often associated with a logic error, rather than a desire to use the default value.

    I wonder though if that means the same case can be made about fields in a class?

  • I think so, Greg. But field initialization is much more complicated, that I'm not sure it would be reasonably possible for the compiler to determine that a field is definitely assigned in all of the common usage cases. The most common cases of local variable initialization are understood by the compiler, but fields are assigned by field initializers and constructors, and the constructors may call other constructors in the same class or in other classes. It's way simpler just to rely on the CLR requirement that heap memory is zeroed before use.

  • @Eric:

    > I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit

    If you drop "init" from ".locals init", then locals won't be initialized, but then the resulting code will be non-verifiable (or at least ECMA CLI spec says so - I'm not sure if .NET will actually treat it as such, since it relaxes a few other overly stringent ECMA rules when it comes to verification).

  • Now, that's interesting: the following fragment prints out 0, but...

    namespace SomeProgram

    {

       enum TEST { One = 1, Two = 2, Three = 3 };

       class Program

       {

           static TEST test;

           static void Main(string[] args)

           {

               Console.WriteLine(test);

           }

       }

    }

    ...but 0 is an invalid value for the enum TEST, because each of its member is assigned a value: 1, 2, and 3. Obviously, some behind-the-scene conversion to int is taking place to facilitate the zeroing out of the class member variables mentioned in the post...

    Would it not be possible to initialize the enum member to one of its valid values here, say, to the minimal one? Every enum is, in turn, an instance of the Enum class: isn't it possible to modify its default constructor to either make a decision based on the integer equivalent of the values, or to throw an exception and/or generate a compiler-time errror when no default value is defined?

    And, qiute apart from that, there is another point to explain why constructors are not the primary member initializers: what about the static members? They may be accessed before any instance of the class is created, so something other than the class' constructor(s) must be used to initialize them.

  • @Denis: See e.g. http://msdn.microsoft.com/en-us/library/cc138362.aspx

    "It is possible to assign any arbitrary integer value to meetingDay. For example, this line of code does not produce an error: meetingDay = (Days) 42. However, you should not do this because the implicit expectation is that an enum variable will only hold one of the values defined by the enum. To assign an arbitrary value to a variable of an enumeration type is to introduce a high risk for errors."

  • @Denis - additionally, zero is *always* a valid enum value: `TEST t =0;` - compiles fine without any cast.

  • Re the ctor topic - additionally (although it is outside of the language) it is possible for *no* constructor to be invoked, for example via `FormatterServices.GetUninitializedObject` (which is used by `DataContractSerializer` in WCF, among other things).

  • @Eric

    > I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit.

    But currently the C# compiler always produces ".locals init" doesn't it? I couldn't find any switch in csc to configure the behavior for ignoring init.

    It won't cause much of any overhead to init the locals anyway, because CLR's JIT would treat zeroing out a local as dead code if the local is assigned with a new value before its first use; the new definition "kills" the old one. Dead code gets eliminated.

  • > Re the ctor topic - additionally (although it is outside of the language) it is possible for *no* constructor to be invoked, for example via `FormatterServices.GetUninitializedObject` (which is used by `DataContractSerializer` in WCF, among other things).

    Not to forget about structs, which do not have a default constructor at all (at least ones defined in C#), and for which no ctor is called when you do "new Foo()" or "default(Foo)".

  • Combine the C tradition of declaring locals at the top of the function with outrageously long functions and blocks of code being moved around and you're asking for (and will receive) bugs.  Either the conditional assignment gets moved so that it follows a read (yippee! Especially when it's a float) or code gets moved in between the declaration and the first assignment that prevents execution of the first assignment.

    I guess sometimes you've gotta make things dumber (e.g., the C# compiler) to prevent undue cleverness from wreaking havoc.  The code that evaluated the condition must have been quite a branch?  Would

    int y;

    if ( StaticFuncThatAlwaysReturnsTrue() ) y = 0;

    have compiled?  Time to crack open another scratch project to see...

  • Personally, I care much more for the behavior to be well-defined; otherwise it may be as complex or as simple as is reasonable (which is of course a matter for discussion, but that's another discussion). The reason is obvious: if I write some code, I want to be able to validate it against the spec and know that it compiles on any C# compiler out there.

    When the compiler is allowed to be arbitrarily clever with no restrictions nor a definite spec, you run into a situation where the only code that's guaranteed to compile everywhere is the one that assumes that compiler is as dumb as possible (in our example, it would be requiring all local variables to be initialized, period). Which is quite useless.

  • Just a small point but "absence of evidence is not evidence of absence" is usually the claim the believer gives to the skeptic! skeptics make the point that absence of evidence _IS_ a form of evidence of absence....at least when active attempts to find evidence have occurred. Your argument as used is correct but thats not exactly what the phrase is usually used to mean. usually the believer is arguing 'well just because we don't have evidence doesn't mean we are wrong' at which the response is of course 'well of course not....but it does mean you are more likely to be wrong!'

    sorry about being picky there.

  • What I am wondering is why local variables are not initialized automatically with their default value, once the compiler determines that they are not definitely assigned.  Would there be a performance penalty for such automatic assignment?

    The reason we require definite assignment is because failure to definitely assign a local is probably a bug. We do not want to detect and then silently ignore your bug! We tell you so that you can fix it. -- Eric

Page 1 of 1 (15 items)