To box or not to box, that is the question

To box or not to box, that is the question

Rate This
  • Comments 59

Suppose you have an immutable value type that is also disposable. Perhaps it represents some sort of handle.

struct MyHandle : IDisposable
{
    public MyHandle(int handle) : this() { this.Handle = handle; }
    public int Handle { get; private set; }
    public void Dispose()
    {
        Somehow.Close(this.Handle);
    }
}

You might think hey, you know, I'll decrease my probability of closing the same handle twice by making the struct mutate itself on disposal!

public void Dispose()
{
    if (this.Handle != 0)
      Somehow.Close(this.Handle);
    this.Handle = 0;
}

This should already be raising red flags in your mind. We're mutating a value type, which we know is dangerous because value types are copied by value; you're mutating a variable, and different variables can hold copies of the same value. Mutating one variable does not mutate the others, any more than changing one variable that contains 12 changes every variable in the program that also contains 12. But let's go with it for now.

What does this do?

var m1 = new MyHandle(123);
try
{
  // do something
}
finally
{
    m1.Dispose();
}
// Sanity check
Debug.Assert(m1.Handle == 0);

Everything work out there?

Yep, we're good. m1 begins its life with Handle set to 123, and after the dispose it is set to zero.

How about this?

var m2 = new MyHandle(123);
try
{
  // do something
}
finally
{
    ((IDisposable)m2).Dispose();
}
// Sanity check
Debug.Assert(m2.Handle == 0);

Does that do the same thing? Surely casting an object to an interface it implements does nothing untoward, right?

.

.

.

.

.

.

.

.

Wrong. This boxes m2. Boxing makes a copy, and it is the copy which is disposed, and therefore the copy which is mutated. m2.Handle stays set to 123.

So what does this do, and why?

var m3 = new MyHandle(123);
using(m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

.

.

.

.

.

.

.

.

.

.

 

Based on the previous example you probably think that this boxes m3, mutates the box, and therefore the assertion fires, right?

Right?

Is that what you thought?

You'd be perfectly justified in thinking that there is a boxing performed in the finally because that's what the spec says. The spec says that the "using" statement's expansion when the expression is a non-nullable value type is

finally
{
  ((IDisposable)resource).Dispose();
}

However, I'm here today to tell you that the disposed resource is in fact not boxed in our implementation of C#. The compiler has an optimization: if it detects that the Dispose method is exposed directly on the value type then it effectively generates a call to

finally
{
  resource.Dispose();
}

without the cast, and therefore without boxing.

Now that you know that, would you like to change your answer? Does the assertion fire? Why or why not?

Give it some careful thought.

.

.

.

.

.

.

.

.

.

.

The assertion still fires, even though there is no boxing. The relevant line of the spec is not the one that says that there's a boxing cast; that's a red herring. The relevant bit of the spec is:

A using statement of the form "using (ResourceType resource = expression) statement" corresponds to one of three possible expansions. [...] A using statement of the form "using (expression) statement" has the same three possible expansions, but in this case ResourceType is implicitly the compile-time type of the expression, and the resource variable is inaccessible in, and invisible to, the embedded statement.

That is to say, our program fragment is equivalent to:

var m3 = new MyHandle(123);
using(MyHandle invisible = m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

which is equivalent to

var m3 = new MyHandle(123);
{
  MyHandle invisible = m3;
  try
  {
    // Do something
  }
  finally
  {
    invisible.Dispose(); // No boxing, due to optimization
  }
}
// Sanity check
Debug.Assert(m3.Handle == 0);

It is the invisible copy which is disposed and mutated, not m3.

And that's why the compiler can get away with not boxing in the finally. The thing that it is not boxing is invisible and inaccessible and therefore there is no way to observe that the boxing was skipped.

Once again the moral of the story is: mutable value types are enough pure evil to turn you all into hermit crabs, and therefore should be avoided.

  • @Simon:

    "Re: Nullable<T>: I'm not sure what you mean by work perfectly as an object type: to my understanding, that was not it's purpose. "new Nullable<T>()" being equal, and convertable to null, is odd, but in keeping with the purpose of Nullable<T> - to enable storing null in a value type. Certainly being able to store null should not be the requirement of being a reference type, given the effort people put into ensuring their reference types are not null!"

    I mean, if Nullable<T> was a class, it would follow reference type behavior (same as object) naturally. Rather than having various special behaviors different than either value type or reference type even though it does be a value type:

               int? i = new int?(); // i is null

               Console.WriteLine(i.ToString()); // work

               Console.WriteLine(i.GetType()); // NullReferenceException

               i = 10;

               object obj = i; // boxed int rather than boxed Nullable<int>

               var ic = (IComparable<int>)i; // i is boxed and then queried for interface rather than query directly on struct

    But anyway, I agree much of your points, especially that value type should not implement IDisposable. My core point is just that mutable value types are _not_ always evil.

  • @SWeko: "Also, if I implement IDisposable both implicitly and explicitly, and use the struct in a using, the Explicit implementation is always called, not the implicit, as I would infer from the article, and from the text of the spec (the expansion actually uses the explicit form)?"

    How can you implement IDisposable both implicitly and explicitly? If it is in the same class, then implicit does not happen at all because it is already explicit. If implicit is done in base class, then the derived class's explicit reimplementation will override the interface function mapping. If explicit is done in base class, then the derived class's implicit reimplementation overrides.

  • @Ivan:

    What if I want assignment and passing arguments to make a copy - eg. to have value semantics? What does that have to do with immutability or performance? And value types' behaviour does imply certain performance characteristics, regardless of implementation - namely that assignment is linear in time to value size. I should note that stack isn't even cheaper than heap in CLR anyway - large and complex reference topologies slow the GC, not heap allocations, even rapid.

    @lidudu:

    Re: Nullable<T>: Yes, those behaviours you listed are all strange, and confusing, but none seem to stand out as reference type behaviour, other than the fact nullable is involved. Would you expect this to return 3?

    int Foo()

    {

       int? a = 1;

       int? b = a;

       a = 3;

       return b;

    }

    Re: @SWeko: Look at the final expansion in the original post: as written, it would call what would be the "implicit" method named "Dispose()" - however the actual compiler calls the true IDisposable.Dispose() method on the instance regardless of actual C# syntax rules.

  • For those wondering why mutable value types (with public fields, no less!) are even allowed if they're so evil, see Rico Mariani's posts about when you actually need them: blogs.msdn.com/.../733887.aspx and blogs.msdn.com/.../745085.aspx

  • Third time trying to post this. If it was held in a queue sorry but I've not had good results from the msdn blog comment system in the past so retrying.

    /off topic

    No where did I say that value types should be treated as if they were the same as reference types. You seem to be putting words in my mouth. Noe the less, if you *do* then a certain class of potential bugs becomes impossible if the value type is immutable. It really doesn't get much simpler than that. Mutable implies more chance of bugs.

    As to readonly I can only assume you didn't read the previously linked post by Eric: stackoverflow.com/.../1144489 it is not about readonly applied to the fields within the struct but to reD only applied to a struct which is a member of another class. Most people I have asked were not awa that a silent copy occurred in those scenarios (by silent I mean no additional variable is visible in code)

    People screw this up, it's a real problem.

    Examples:

    stackoverflow.com/.../foreach-struct-weird-compile-error-in-c

    The issue was sufficiently important that the c# compiler team decided that they would detect obvious cases of pointless mutation of a silent copy and treat it as a *error* not even a warning. Even then people still don't get it...

    www.eggheadcafe.com/.../direct-access-to-struct-variables-in-liststruct-compiler-error-.aspx

    Mutability for a struct is absolutely positively not as useful for a struct as for a reference type, I cannot fathom how you could think that. There are certain very  specific interoperability scenarios where being able to write into chunks of memory with some level of type safety is useful, likewise if you can use them in some very hidden manner that involves no copying (like the foreach scenarios) then there is utility to the technique. I personally think that it would have been better if the compiler could detect the foreach cases and only use the mutable structs when that occurs. Calling GetEnumerator on a (compile time known) List<T> and then doing anything that could trigger a copy will produce the most marvellous bugs that surprisingly few people will understand (I have seen some very good and smart programmers bemused by this and it took some fun with Reflector the first time I experienced it)

  • @Simon, @lidudu: Here's an example of what I was talking about:

           public struct MyHandle : IDisposable

           {

               public int Handle { get; private set; }

               public MyHandle(int handle) : this() { this.Handle = handle; }

               public void Dispose()

               {

                   Console.WriteLine("Disposing Implicit");

                   this.Handle = 0;

               }

               void IDisposable.Dispose()

               {

                   Console.WriteLine("Disposing Explicit");

                   this.Handle = 100;

               }

           }

           public static void Test()

           {

               var m1 = new MyHandle(1);

               m1.Dispose(); // <-- "Disposing Implicit"

               var m2 = new MyHandle(2);

               ((IDisposable)m2).Dispose(); // <-- "Disposing Explicit"

               using (var m3 = new MyHandle(3))

               {

                   //according to the article, should be Disposing Implicit

                   //actual value is Disposing Explicit

               }

           }

    So, we can implement both an implicit and an explicit version of the interface in the same class, and they will both be called in the appropriate context.

    However, the article stated that for this case, the compiler avoids boxing and calls directly, thus calling the implicit implementation, but in the example the explicit implementation is called.

  • @SWeko: That's one way to say it, but the runtime and CLR don't see it that way - you merely have a type that explicitly implements Dispose(), and also has an unrelated public method called Dispose(). It's sort of like "public new void Method()" without the inheritance - reusing the name without any polymophism or virtual slots being involved.

  • @Simon:  Yes, there's always a misunderstanding possible when treating CIL code as it was C#, and an explicit implementation always trumps the implicit one when used via the interface. However, the compiler is quite content if I do not provide an explicit implementation, more, I'll speculate that there are lots of c# users that do not know about explicit interface implementations.

    The implicit call ( m1.Dispose(); ) is transfered to

     call       instance void TestAppConsole.Structs/MyHandle::Dispose()

    The explicit call (((IDisposable)m2).Dispose(); ) is tranfered to

     box        TestAppConsole.Structs/MyHandle

     callvirt   instance void [mscorlib]System.IDisposable::Dispose()

    So there is actual boxing included if I invoke the explicit implementation - even if no explicit implementation is provided in the struct code.

    The using statement's dispose is indeed compiled to

     constrained. TestAppConsole.Structs/MyHandle

     callvirt   instance void [mscorlib]System.IDisposable::Dispose()

    that calls a virtual method on a value type, so there's no boxing/ copying involved.

    In C# terms, this is neither m.Dispose() - it's a callvirt, not a call, neither a ((IDisposable)m).Dispose() - there's not boxing included, but a completely third construct, so yes, my C# restricted musings are a bit moot.

  • As a side note, there is actually Box<T> in the Framework:

    msdn.microsoft.com/.../bb549038.aspx

    No syntactic sugar, though.

  • @SWeko: Yeah, the complaint was merely about calling it implicit implementation when it's not actually implementing. :) Thanks for the IL - now I need to go lookup what "constrained." does again.

    @Pavel: I was kinda hoping that "(StrongBox<int>)(object)obj;" would work. I'm trying to think what other compiler support would be usefull, none of the operators seem to make much sense (can't pick between assigning to variable or value, mostly). Thanks for the pointer, though!

  • @Pavel Minaev, from the page you linked to, "This API supports the .NET Framework infrastructure and is not intended to be used directly from your code."

  • @snarfblam: Everything in the "System.Runtime.CompilerServices" namespace gets that - it's a little overblown, I think it's just trying to say "This stuff is designed to be easy for compilers, not humans. Please don't complain about usability."

  • @Simon: I don't see a problem in

    int Foo()

    {

      int? a = 1;

      int? b = a;

      a = 3;

      return (int)b; // you missed a cast here

    }

    If nullable was reference type, the code still produce 1, because a = 3 translates to a = (int?)3 but never a.Value = 3.

  • I feel that the issue is more of improper use of value type for types which need to manage ownership, etc. I don't think that's what value type is designed for, as per what .NET library design guidelines say. In those cases, reference type should be used instead of value type. If, for any reason, the coder really want it to be a value type, then design the type as immutable.

    @SWeko: it is interesting that it generates

    constrained. TestAppConsole.Structs/MyHandle

    But I kind of think it is consistent with generic case like:

    void Func<T>(T s) where T: struct, IDisposable

    {

     s.Dispose();

    }

    For T as your MyHandle struct, this function also calls IDisposable.Dispose() without boxing. Because Dispose() is called on the struct variable directly, rather than converting to IDisposable interface type first.

    If you do ((IDisposable)s).Dispose(), then you are explicitly creating a temporary variable of type IDisposable, which is a reference type, then it will surely produce IL code for boxing.

  • @lidudu: Good catch on the cast (though I preffer .Value) - but you are describing value semantics. Unless you want "a = 3" to mean value semantics and "a.Value = 3" to be either an null error or reference semantics, which is sane (and also System.Runtime.CompilerServices.StrongBox<T>). But I beleive the value-type behavoiur of Nullable<T> is entirely by design.

Page 3 of 4 (59 items) 1234