Null Is Not Empty

Null Is Not Empty

Rate This
  • Comments 33

Back when I started this blog in 2003, one of the first topics I posted on was the difference between Null, Empty and Nothing in VBScript. An excerpt:

Suppose you have a database of sales reports, and you ask the database "what was the total of all sales in August?" but one of the sales staff has not reported their sales for August yet. What's the correct answer? You could design the database to ignore the fact that data is missing and give the sum of the known sales, but that would be answering a different question. The question was not "what was the total of all known sales in August, excluding any missing data?" The question was "what was the total of all sales in August?" The answer to that question is "I don't know -- there is data missing", so the database returns Null.

This principle underlies the design of nullable value types in C#. The reason that we have nullable value types at all is because there is a semantic difference between the null integer/decimal/double/whatever and the zeroes of those types. A zero means “I know that the quantity is zero”, a null means “I don’t know what the quantity is”.

This also explains why nulls propagate; if you add two nullable ints and one of them is null then the answer is null. Clearly ten plus “I don’t know” equals “I don’t know”, not ten.

The concept of “null as missing information” also applies to reference types, which are of course always nullable. I am occasionally asked why C# does not simply treat null references passed to “foreach” as empty collections, or treat null strings as empty strings (*). It’s for the same reason as why we don’t treat null integers as zeroes. There is a semantic difference between “the collection of results is known to be empty” and “the collection of results could not even be determined in the first place”, and we want to allow you to preserve that distinction, not blur the line between them. By treating null as empty, we would diminish the value of being able to strongly distinguish between a missing or invalid collection and and present, valid, empty collection.

Now, if for some odd reason you do wish to treat null collections the same as empty collections, that’s easy enough to do. You can simply use the null coalescing operator; that’s what it’s for:

foreach(Customer customer in customers ?? Enumerable.Empty<Customer>())

The ?? operator means “use the left hand side, unless if the left hand side is null, use the right hand side.” Handy, that.

**************

(*) C# does treat null strings as empty strings when concatenating them. See the comments for a discussion of this fact.

  • You shouldn't really talk about database null as compared to C# null. Database nulls (in MS SQL at least) are very annoying and they don't mean "data is missing" they more accurately mean "i have no idea what this is"

    I don't understand the distinction you're drawing between "missing" and "unknown". -- Eric

    As you probably know, in SQL "where a = null" and "where a <> null" return exactly the same set of rows. You need to use "is not null". God help us all if C# used _that_ approach.

    There was considerable debate over that when nullable value types were added to C# and VB. C# chose the approach you approve of -- which makes the semantics of the equality operators inconsistent and broken, but easier to read in the common case. VB chose the approach you disapprove of -- to lift equality to nullable and be consistent about comparisons of null values. Personally I prefer VB's approach; it is less intuitive but more accurate and consistent. (Which is funny, because normally C# is the less inituitive but more precise language and VB is the more intuitive but less precise language.) -- Eric

  • Eric I hope you can answer my question below.. tia,

    @Mark in your post about thread safety your final solution is equivalent to :

    """

      1: private volatile OrderCollection orders;

      2:  

      3: public OrderCollection Orders

      4: {

      5:   get

      6:   {

      7:     return this.orders ?? Interlocked.CompareExchange(ref this.orders, new OrderCollection(), null);

      8:   }

      9: }

    """

    My question is whether that Interlocked.CompareExchange is really even necessary?

    Since you are using the volatile modifier on the orders field, the compiler will see that the backing field orders is a volatile field.

    I am not sure, but I think that will make the whole statement expression extending all the way to the end point of the statement, i.e. to the end of the return, have locked semantics..

    As I said, I am not clear on this, but would not these two statements be equivalent with regard to thread safety...

    stmt 1:  return this.orders ?? Interlocked.CompareExchange(ref this.orders, new OrderCollection(), null);

    stmt 2:  return this.orders ?? new OrderCollection();

  • oops, final statements should have been

    stmt 1:  return this.orders ?? ( this.orders = Interlocked.CompareExchange(ref this.orders, new OrderCollection(), null) );

    stmt 2:  return this.orders ?? ( this.orders = new OrderCollection() );

  • C#/VS2010 Null is Not Empty VS2010: On Triangles and Performance - It sure looks like the *very* soon Beta 1 will exhibit some great work on Outlining and Performance Parallel Tasks - new Visual Studio 2010 debugger window ASP.NET Tip #61: Did you know...How

  • Interesting Finds: May 15, 2009

  • This is a bit unrelated to the topic, but I started with this (to see your topic "in action")

    String a = null;
    var b = a + null;
    Console.WriteLine(b.Length);

    Then I tried using different objects, like:

    Form f1 = new Form();
    var f2 = f1 + null;
    Console.WriteLine(f2.Length);

    I was expecting compilation errors ("adding" null to a Form? "Length" of a Form?), but instead it compiles and runs just fine. The output is:

    System.Windows.Forms.Form, Text:

    So, it turns out that in "var f2 = f1 + null;" var becomes a string, and calls ToString() on f1 to concat (my guess). Is it so? And if yes, why? Why am I able to add a Form to a null, and get a String? I'm probably missing something in how "var" works...

    Though I applaud your experimental approach, rather than guessing at the semantics you might consider reading the spec, which states:

    The binary + operator performs string concatenation when one or both operands are of type string. If an operand of string concatenation is null, an empty string is substituted. Otherwise, any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object. If ToString returns null, an empty string is substituted. 

    Now, this bit is not perfectly accurate. Clearly in your "form" case neither operand is of type string. This bit really should say "when one or both operands can be implicitly converted to string and operator overload resolution chooses one of the built-in string concatenation operators".

    As I noted in Mike's comment above, I had momentarily forgotten about this unfortunate fact about string concatenation. This is not how I would have done things, but this choice was imposed upon the language by the implementation of String.Concat. It would be awfully weird to have a language where + did one thing and String.Concat did another. -- Eric

     

  • Is there a vb.net equivalent for ...

    foreach(Customer customer in customers ?? Enumerable.Empty<Customer>())

    ?

    S

  • I have a question about the construct:

     foreach(Customer customer in customers ?? Enumerable.Empty<Customer>())

    I haven't used the ?? operator before but the above syntax looks a little klunky to me. Take this specific example:

     int[] data = new int[] { 1, 2 };

     foreach ( int i in data ?? Enumerable.Empty<int>() ) {}

    in the above case if I just include System and not an System.Linq I get a compiler error. Why should I have to include Linq specific code for this? The foreach is "Linq independent" so it seems to me that maybe there should be a new keyword or contextual keyword to make this a little cleaner? We have foreach. We have default. Maybe DefaultEmptyCollection<int> which uses an array :)

    The "Enumerable" class is in the System.Linq namespace. That's where all the rest of the LINQ sequence operators are, so it's a sensible place. -- Eric

    Paul.

  • "The pleasant fact that value types are of known size, and need not be garbage collected makes it difficult to make strings value types. Also, the fact that strings can be cheaply copied by reference instead of copying all their bits, as we do with value types, is a big perf win."

    Eric, I don't agree with you. It's pretty easy to make a string a value type (but the value must internally always contain a reference to a char array). The value would then always have the same size as the size of a reference (32 bits on x86) and copying is safe and just as fast as copying an integer or a reference. I believe the real reason not to implement it as value type is because this would lead to large amounts of boxing, especially in CLR 1.0 applications, where there was no generics.

    Well, sure, I suppose. But I don't understand the point. I mean, we could cut out all the character array rigamarole and just say that struct MyString { public String theRealString } is a "value typed string". What does that buy us?

    Taking a storage location which can contain a 32 bit managed reference to a string and reinterpreting it as storage of MyString doesn't change anything germane, it just makes it harder to take advantage of the underlying ref type. We can take _any_ reference type and explicitly wrap a value type around the reference; that just makes it slightly harder to compare things by reference.

    It doesn't change the fundamental fact that the _data storage_ is ultimately implemented using reference semantics. Essentially what this example highlights is that references are themselves values. References are already treated as value types; the interesting thing about them is that they refer to something, not that they're copied around by value.

    Making string, or any type, a "shallow" value type is trivial -- so trivial that it's not very interesting. Such a beast still has the fundamental property of reference types: that it refers to something else. Making it deeply a value type, the way, say, int is, so that it refers to nothing, that's a what I meant by it being a lot more difficult. -- Eric

     

    Just for fun, here is a non-nullable string implementation (named 'vstring') as value type :-)

    public struct vstring : IComparable<vstring>, IEnumerable<char>, IEnumerable, IEquatable<vstring>

    {

       private readonly string value;

       public vstring(string value)

       {

           this.value = value;

       }

       public vstring(char[] value)

       {

           this.value = new string(value);

       }

       // Never returns null.

       public override string ToString()

       {

           return value ?? string.Empty;

       }

       public override int GetHashCode()

       {

           return this.ToString().GetHashCode();

       }

       public int CompareTo(vstring other)

       {

           return this.ToString().CompareTo(other.value ?? string.Empty);

       }

       public IEnumerator<char> GetEnumerator()

       {

           return this.ToString().GetEnumerator();

       }

       IEnumerator IEnumerable.GetEnumerator()

       {

           return this.GetEnumerator();

       }

       public bool Equals(vstring other)

       {

           return this.ToString().Equals(other.value ?? string.Empty);

       }

       public vstring ToLower()

       {

           return new vstring(this.ToString().ToLower());

       }

       public vstring ToUpper()

       {

           return new vstring(this.ToString().ToUpper());

       }

       public static bool Equals(vstring a, vstring b)

       {

           return a.ToString() == b.ToString();

       }

       public static bool operator ==(vstring a, vstring b)

       {

           return Equals(a, b);

       }

       public static bool operator !=(vstring a, vstring b)

       {

           return !Equals(a, b);

       }

       public static vstring operator +(vstring a, vstring b)

       {

           return new vstring(a.value + b.value);

       }

    }

  • >> Would you rather abandon these benefits in exchange for making strings value types?

    Eric,

    Don't get me wrong. I'm not saying that they shoud be *implemented internally* as value types (ie being allocated on stack, copied bit by bit every time the wind blows, etc.) but they should *appear to the programmer* as value types (ie never be null, and instead initialized by default to the empty string, etc...) That's one of the too rare things Borland Delphi does right :-)

    Basically that's almost just syntactic sugar : every time you see a string declaration, initialize it to String.Empty instead of null, and catch every assignation of null to a string (including return statements.)

  • >> Well, sure, I suppose. But I don't understand the point.

    There is no point in doing that and it won't buy us anything, expect that programmers would see strings as value types, which is the point Stephan Leclercq tried to make. While I understand Stephan’s point, I’m against doing this. While strings then would really represent a logical value, this might give developers the wrong impression that strings would be copied completely by value, which could never be the case, because this -as you said- would be disastrous for performance.

    So I didn’t disagree on string being a reference type, I only disagreed on the arguments you gave against string not being a value type.

    ps. I saw my code example 'exploded' on your blog. It now takes a lot of space, sorry for that.

  • >> But as a practical matter, I'm afraid strings are reference types, and that there are good reasons for that. The pleasant fact that value types are of known size, and need not be garbage collected makes it difficult to make strings value types. Also, the fact that strings can be cheaply copied by reference instead of copying all their bits, as we do with value types, is a big perf win.

    This does not preclude from String being a value type. It just has to be a value type that encapsulates a single reference to an internal "StringData" reference type, with the latter working exactly as System.String works today. So user sees a value type, with no null value, and the implementation still gets all the benefits of a reference types.

    >> While strings then would really represent a logical value, this might give developers the wrong impression that strings would be copied completely by value

    It wouldn't matter in the slightest. Since strings are immutable, there are no observable effects between a copying implementation, and a sharing implementation (well, except for Object.ReferenceEquals, but why would you care about that one?). So it doesn't really matter what impression developers get - it will be consistent with behavior either way.

  • "The designers of String.Concat chose to treat null concatenation as empty string concatenation.

    Which means that (string)null + (string)null gives you an empty string in C#, bizarrely enough"

    MS SQL has the same behavior if you SET CONCAT_NULL_YIELDS_NULL OFF  which happened to have burned me last week because two different apps set it differently.

  • "Explain the difference between Null, Empty, and Nothing" has been one my favorite interview questions for years. I've always enjoied the null/empty/nothing stares after I ask that question.

    Does the answer to question really tell you much about the candidate? If they've claimed to be a VB expert then that will certainly tell you whether they are or not I suppose. But I try to ask interview questions that allow the candidate to demonstrate skills, intelligence or passion rather than testing domain-specific knowledge. I assume that anyone who is smart, skilled and gets stuff done can learn the domain. -- Eric

     

  • "The designers of String.Concat chose to treat null concatenation as empty string concatenation."

    Hopefully these designers have been re-assigned.  :^)

    Seriously, not a good design decision in my humble opinion. It only masks problems and adds to the newbie confusion between null and the empty string. I guess it's too late to turn back though.

Page 2 of 3 (33 items) 123