null is not false

null is not false

Rate This
  • Comments 38

The way you typically represent a "missing" or "invalid" value in C# is to use the "null" value of the type. Every reference type has a "null" value; that is, the reference that does not actually refer to anything. And every "normal" value type has a corresponding "nullable" value type which has a null value.

The way these concepts are implemented is completely different. A reference is typically implemented behind the scenes as a 32 or 64 bit number. As we've discussed previously in this space, that number should logically be treated as an "opaque" handle that only the garbage collector knows about, but in practice that number is the offset into the virtual memory space of the process that the referred-to object lives at, inside the managed heap. The number zero is reserved as the representation of null because the operating system reserves the first few pages of virtual memory as invalid, always. There is no chance that by some accident, the zero address is going to be a valid address in the heap.

By contrast, a nullable value type is simply an instance of the value type plus a Boolean that indicates whether the value is to be treated as a value, or as null. It's just a syntactic sugar for passing around a flag. This is because value types need not have any "special" value that has no other meaning; a byte has 256 possible values and every one of them is valid, so a nullable byte has to have some additional storage.

Some languages allow null values of value types or reference types, or both, to be implicitly treated as Booleans. In C, you can say:

int* x = whatever();
if (x) ...

and that is treated as if you'd said "if (x != null)". And similarly for nullable value types; in some languages a null value type is implicitly treated as "false".

The designers of C# considered those features and rejected them. First, because treating references or nullable value types as Booleans is a confusing idiom and a potential rich source of bugs. And second, because semantically it seems presumptuous to automatically translate null -- which should mean "this value is missing" or "this value is unknown" -- to "this value is logically false".

In particular, we want to treat nullable bools as having three states: true, false and null, and not as having three states: true, false and different-kind-of-false. Treating null nullable Booleans as false leads to a number of oddities. Suppose we did, and suppose x is a nullable bool that is equal to null:

if (x)
  Foo();
if (!x)
  Bar();

Neither Foo nor Bar is executed because "not null" is of course also null. (The answer to "what is the opposite of this unknown value?" is "an unknown value".) Does it not seem strange that x and !x are both treated as false? Similarly, if (x | !x) would also be treated as false, which also seems bizarre.

The solution to the problem of these oddities is to avoid the problem in the first place, and not make nulls behave as though they were false.

Next time we'll look at a different aspect of truth-determining: just what is up with those "true" and "false" user-defined operators?

  • I posted two comments, and only the second one appeared. I also posted comments to earlier recent entries which also didn’t appear. It seems to me that your blog swallows comments wholesale, Eric... which is not nice :(

    I’ll try again. You wrote:

       The number zero is reserved as the representation of null because the operating system reserves the first few pages of virtual memory as invalid, always.

    Surely the causality is the other way around? Surely the OS reserves that part of the address space *because* programming languages tend to use 0 for the null pointer?

  • @Timwi That's a general MS blog bug, which has existed for.. a long, long time*. Eric can't do much more about it than we.

    *generally caused by taking too long to post the message. General procedure is to copy your post before hitting "post" ;)

  • @Deduplicator:

    ECMA-372 (C++/CLI Language Specification) §12.3.3 stipulates: "The representation of a handle with value nullptr shall be all-bits-zero."  I think this requirement would make it difficult to use a nonzero representation for C# null references, on CLI implementations that support C++/CLI.

  • Since you brought up Nullables, I would like to ask why they are implemented as a special struct, instead of the much simpler approach of using boxed structures. I understand that C++/CLI allows you to access members of a boxed struct. It seems to me that it would have been much simpler if "int?" simply meant "boxed int, possibly null". It would have made the compiler simpler, it would have made the type system more harmonious (fewer special cases), and it would have allowed generic methods like the following (with no constraints):

    void f<T>(T? x) { ... } // accepts any reference type, including boxed structs.

    But, on the topic of this post, I find the arguments entirely unpersuasive. If the question is whether pointers should be implicitly convertible to boolean, well, maybe the answer is no. But I don't see how "if (p)" is a source of bugs, and allowing this form does not actually require that pointers are convertible to boolean (although that is the simplest approach). Allowing if (!p) does imply the existence of an operator! that returns bool (but did you consider the alternative--the "unless"/"if not" and "until" statements?).

    For me the arguments against "if (p)" are clearly outweighed by the single argument in favor: it saves time. I have written "!= null" about 1300 times in my current solution, with 1000 of those cases in "if" statements. I have written "== null" 800 times. I'm just plain tired of typing it.

  • Qwertie: If you box a struct, it is no longer a value type and no longer has value semantics. Instead it will have reference semantics along with all the overhead of a reference type. What you're asking for is a Box<T> type rather than a different implementation of Nullable<T>.

  • Eric, I would love to see a blog about any magic the CLR / compiler does to support Nullable<T>. It seems to me that there must something special done to make Nullable<T> box as T when non-null, so that I can say:

    int? x = 1;

    object y = x;

    y.GetType() == typeof(int) // true

    I can't see how this behaviour would come, well, out of the box.

  • @ Kalle Olavi Niemitalo:

    Thanks for the quote.

    Actually, I see one and only one good reason for not providing a standard-conversion from (null, nonnull) to (false, true), for everything but nullable<bool>.

    It's a bit curious that nobody mentioned that you can create custom conversions in C#. Because C# refers to all objects by managed pointer, there's no way to decide weather you wanted to convert the pointer or the referenced object. C/C++/others don't have that problem, because either they don't support custom conversions using standard syntax or the provide references and pointers with different syntax, making that trivial.

  • If null means Unknown then why is it legal to use the equality operator to check for null? If a reference is an Unknown value how can you check if that is equal to another Unknown value? Shouldn't the answer to such a comparison be Unknown?

Page 3 of 3 (38 items) 123