To box or not to box, that is the question

To box or not to box, that is the question

Rate This
  • Comments 59

Suppose you have an immutable value type that is also disposable. Perhaps it represents some sort of handle.

struct MyHandle : IDisposable
{
    public MyHandle(int handle) : this() { this.Handle = handle; }
    public int Handle { get; private set; }
    public void Dispose()
    {
        Somehow.Close(this.Handle);
    }
}

You might think hey, you know, I'll decrease my probability of closing the same handle twice by making the struct mutate itself on disposal!

public void Dispose()
{
    if (this.Handle != 0)
      Somehow.Close(this.Handle);
    this.Handle = 0;
}

This should already be raising red flags in your mind. We're mutating a value type, which we know is dangerous because value types are copied by value; you're mutating a variable, and different variables can hold copies of the same value. Mutating one variable does not mutate the others, any more than changing one variable that contains 12 changes every variable in the program that also contains 12. But let's go with it for now.

What does this do?

var m1 = new MyHandle(123);
try
{
  // do something
}
finally
{
    m1.Dispose();
}
// Sanity check
Debug.Assert(m1.Handle == 0);

Everything work out there?

Yep, we're good. m1 begins its life with Handle set to 123, and after the dispose it is set to zero.

How about this?

var m2 = new MyHandle(123);
try
{
  // do something
}
finally
{
    ((IDisposable)m2).Dispose();
}
// Sanity check
Debug.Assert(m2.Handle == 0);

Does that do the same thing? Surely casting an object to an interface it implements does nothing untoward, right?

.

.

.

.

.

.

.

.

Wrong. This boxes m2. Boxing makes a copy, and it is the copy which is disposed, and therefore the copy which is mutated. m2.Handle stays set to 123.

So what does this do, and why?

var m3 = new MyHandle(123);
using(m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

.

.

.

.

.

.

.

.

.

.

 

Based on the previous example you probably think that this boxes m3, mutates the box, and therefore the assertion fires, right?

Right?

Is that what you thought?

You'd be perfectly justified in thinking that there is a boxing performed in the finally because that's what the spec says. The spec says that the "using" statement's expansion when the expression is a non-nullable value type is

finally
{
  ((IDisposable)resource).Dispose();
}

However, I'm here today to tell you that the disposed resource is in fact not boxed in our implementation of C#. The compiler has an optimization: if it detects that the Dispose method is exposed directly on the value type then it effectively generates a call to

finally
{
  resource.Dispose();
}

without the cast, and therefore without boxing.

Now that you know that, would you like to change your answer? Does the assertion fire? Why or why not?

Give it some careful thought.

.

.

.

.

.

.

.

.

.

.

The assertion still fires, even though there is no boxing. The relevant line of the spec is not the one that says that there's a boxing cast; that's a red herring. The relevant bit of the spec is:

A using statement of the form "using (ResourceType resource = expression) statement" corresponds to one of three possible expansions. [...] A using statement of the form "using (expression) statement" has the same three possible expansions, but in this case ResourceType is implicitly the compile-time type of the expression, and the resource variable is inaccessible in, and invisible to, the embedded statement.

That is to say, our program fragment is equivalent to:

var m3 = new MyHandle(123);
using(MyHandle invisible = m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

which is equivalent to

var m3 = new MyHandle(123);
{
  MyHandle invisible = m3;
  try
  {
    // Do something
  }
  finally
  {
    invisible.Dispose(); // No boxing, due to optimization
  }
}
// Sanity check
Debug.Assert(m3.Handle == 0);

It is the invisible copy which is disposed and mutated, not m3.

And that's why the compiler can get away with not boxing in the finally. The thing that it is not boxing is invisible and inaccessible and therefore there is no way to observe that the boxing was skipped.

Once again the moral of the story is: mutable value types are enough pure evil to turn you all into hermit crabs, and therefore should be avoided.

  • @Simon

    "Also, wouldn't you like a way to call unexposed interface members without boxing?"

    Sorry I don't see what you mean here, interface related methods are always exposed at the same accessibility as the interface.Are you refering to an explicit interface implementation on a struct, I assume such a thing is possible but it would be somewhat perverse, certainly I can't see a need for it?

    (note using he but no idea if that's valid)

    he certainly wasn't talking philosophically when he wrote:

    "When we need to treat a value like an object, it gets boxed. It gets put in the same kind of “package” as every normal reference-type object. In most cases, we can’t use the boxed value until we un-box it by casting it back to a value type. This means that while it’s parading as an object, it never actually has the chance to act like an object. Interfaces are the exception, because they box a value type, and they declare behavior for that boxed object."

    That's two assertion that are plain wrong.

    He spends a great deal of time saying "because you can change a variable that means things that are immutable actually aren't" This is patently ludicrous (try his 'thought experiment on a string).

  • @Shuggy:

    Yes, explicit interface implementation is possible - it's useful in exactly the same situations it's useful on reference types minus solving inheritance clashes.

    Oh whoops, I thought you were still referring to his reply, not the post.

    I might be missing something, but I don't see any invalid statements in that quote? I'd state it differently, but "treat[ing] a value like an object" is one way to say using a value without knowing it's full type, which of course does require boxing, either to Object, ValueType, or a declared interface.

    The english gets confused around "parading as an object, [...] act like an object", but I think he's driving at something like values boxed as Object are useless untill you cast them to either the value type or an interface.

    re: his blog post - I think he's knows exactly what he means, he just is not exacting in his english. His eventual point is that since value type values don't have identity, mutation is indistinguishable from replacement - an int is merely 32 bits, changing the value is either mutating the bits or replacing the bits, there's no difference. Mutating Point.X is equivalent to replacing Point with the new X and the same Y (excluding perf. and atomicity).

    He then goes on to state that the issues that arise with mutable structs are to do with their misuse as "fast objects" - when they clearly don't behave anything like that.

    All in all, I wish for a BCL "class Box<T> where T: struct { public T Value; }" with the Nullable<T>-like C# sugar: something that would make value types *actually* work like objects so people who want to use them like that can.

  • I kind of agree that this is more of the issue of treating value types like object. But I my opinion is that it is a compromise we should just accept.

    The main argument to add value types to C# instead of using class for everything, if I'm not wrong, is for performance. Then we should accept the issues introduced by that choice. Mutating value type is definitely useful for performance and it is common for C/C++ programs. If we are not able to harness it, we should go and find an easier job.

    Nullable<T> is a similar thing. To make it work perfectly as an object type, it should have been a class. But for performance reason, it is designed as a struct. So then we have to accept issues like:

    T varOfT = new T()

    may cause varOfT to have value null if T is a nullable type.

  • @lidudu: Performance is only sometimes a benifit if structs are immutable - in fact, the reason String is a reference type and not a value type is performance - but you hardly tell the difference between immutable reference types and immutable value types. There are certain types that are easier to implement and use as value types, however, especially those types we tend to think of as "values" rather than "things" - mathematical stuff like Pair, Point, Matrix, Color especially.

    A nitpick: it doesn't really make sense to talk about value types for C or C++, since they have no concept of the thing. To C, all variables hold values, just some values are references.

    Re: Nullable<T>: I'm not sure what you mean by work perfectly as an object type: to my understanding, that was not it's purpose. "new Nullable<T>()" being equal, and convertable to null, is odd, but in keeping with the purpose of Nullable<T> - to enable storing null in a value type. Certainly being able to store null should not be the requirement of being a reference type, given the effort people put into ensuring their reference types are not null!

  • @Simon

    Then I would assume calling an interface method on a value type where it is done by an explicit interface implementation without boxing might be pretty simple, if messy and repetitive

    For any method Frob on interface IFrobber

    public static void Frob<T>(this T t) where T : IFrobber

    {

       T.Frob()

    }

    I haven't verified this doesn't box yet (though I will do)

    As to his ideas on identity and mutability it again ignores the fundamental aspect which is the copy semantics, if you operate on a value type it is entirely possible you are operating on a copy, if you operate on a (non ref) copy of a reference you get no difference in behaviour at all, mutable or not. If you do so on a mutable value type you do. If it is immutable then it doesn't matter since you cannot perceive a difference. This is a fundamental difference so to say there is no difference between Point and int is wrong.

  • Oh and strings are reference types for many reasons but chief will be because they are variable size, so taking a copy very time you passed a multi megabyte string would be madness!

  • So, actually the current C# compiler does not adhere to the C# spec :)

    Anyway, It's relatively hard to run into this, as the usual usage is to declare the variable right there in the using statement, and I can't remember I've ever used it otherwise.

    The spec says that the used type should be "a type that can be implicitly converted to System.IDisposable", however since conversions to an interface are not allowed, itsn't this equivalent to "a type that implements System.IDisposable"?

    Also, if I implement IDisposable both implicitly and explicitly, and use the struct in a using, the Explicit implementation is always called, not the implicit, as I would infer from the article, and from the text of the spec (the expansion actually uses the explicit form)?

  • @Shuggy: Sure, but it copies. You need "void Frob<T>(ref T t) where T : IFrobber", and that's just ugly. And yes, like I said, strings are reference types because of performance, but what other reason do they have? They try pretty hard to behave like value types in every other way (immutable so the copy semantics don't matter, and using value equality).

    @SWeko: Good catch on the call by name/call by interface slot on Dispose - I just checked and it looks like the C# compiler cheats and calls the explicit implimentation on both forms, without boxing on the initialising form.

  • You're using structs, you shouldn't care if it copies since that's what they do (or if you are doing some serious low level optimisation knowing you can use ref and accept the ugly is fine)

    The whole reason why immutability is a very real concern that shouldn't be brushed aside as "well you can change the variable so it everything is mutable really" is that it makes precisely the scenarios we are discussing here *not matter*

  • okay, looks like it avoids boxing (disclaimer, I've not looked at the resulting assembly

    void Main()

    {

       var nasty = new FrobValue(0);

       Frob(nasty);

    }

    public static void Frob<T>(T t) where T : UserQuery.IFrobber

    {

       t.Frob();

    }

    public static void FrobRef<T>(ref T t) where T : UserQuery.IFrobber

    {

       t.Frob();

    }

    public interface IFrobber

    {

       void Frob();

    }

    public struct FrobValue : IFrobber

    {

      public FrobValue(int _) {}

      void IFrobber.Frob() { }

    }

    produces (in LinqPad)

    IL_0000:  ldloca.s    00

    IL_0002:  ldc.i4.0    

    IL_0003:  call        UserQuery+FrobValue..ctor

    IL_0008:  ldloc.0    

    IL_0009:  call        UserQuery.Frob

    Frob:

    IL_0000:  ldarga.s    00

    IL_0002:  constrained. 01 00 00 1B

    IL_0008:  callvirt    UserQuery+IFrobber.Frob

    IL_000D:  ret        

    FrobRef:

    IL_0000:  ldarg.0    

    IL_0001:  constrained. 01 00 00 1B

    IL_0007:  callvirt    UserQuery+IFrobber.Frob

    IL_000C:  ret        

    IFrobber.Frob:

    FrobValue.UserQuery.IFrobber.Frob:

    IL_0000:  ret

  • @Shuggy:

    Well the reason you would want to avoid boxing is so you can mutate through an interface - if copying was OK, ((IDisposable)t).Dispose() would be equivalent if potentially an iota more work for GC. And noone's "brushing aside" the problems with mutability, just that it is somehow OK for reference types, but horrible and should never be done for value types. If anything, I would say immutablity is more useful for reference types than value types given the increased chance for unexpected aliasing and far higher chance for cross-thread access, not to mention simplifying the ambiguity about modifying the reference versus the object when "identity" is not clearly defined for a type.

  • @Simon

    "just that it is somehow OK for reference types, but horrible and should never be done for value types."

    Okay you and I simply disagree at a very fundamental level.

    Anything which has value copy semantics is an immensely bad choice for mutability. The copy operations are  silent, and subtle (there are many occasions when copies are taken), any refactoring which pulled a value type between them could change the semantics, various aspects of construction would be hideous, readonly instance fields with them in get very messy.  

    Not to mention the concept of boxing changing the lifetime of the value, how on earth is that supposed to work...

    The List<T>.Enumerator gets away with it in normal scenarios because using foreach means you can't even see the enumerator to mutate it.

  • @Shuggy:

    But that thinking is exactly what @snarfblam was referring to as the real cause of the problems people have with value types - value types are value types, not objects, of course they behave differently. Treat them as values, and you're fine. I would argue IDisposable is (in general) not what you would want for a value type, not because it requires mutability, but because it implies *ownership* - which is the exact opposite of a plain value. Sure, you can, and interop style handles are a good candidate, just remember that there *will* be more than one value floating around.

    The copy operations are not *that* subtle, the majority of use cases are extremely obvious when they are copied - assignment operator? Copy. Passing an argument? Copy. I'm not sure what you're reffering to with construction - the rules about value type constructors require good design here - no ownership, no magic on copy, no type invariants. I do agree about readonly instance fields to a certain degree, but consider that anything you can do with a non-readonly immutable value type you can do with a readonly mutable value type and vice-versa. "readonly" is nonsensical for all value types, not just mutable value types. (And the readonly immutable combo is useful for both reference and value types).

    And both boxing and unboxing are each another copy, not that complex.

    To sum up: value types are not reference types and you should not think of them like that. They have different rules for design, different semantics when you use them, and different (not better or worse!) performance. If you treat them right, mutability is just as useful for them as for reference types.

    Whoops - I've gone into total wall of text mode - sorry about that :)

  • there's no point in using value types unless you want it to be copied and immutable, so your style is just bad. There's no performance improvement as well, specs don't say value types are allocated on stack, it just happened to be like that in current versions of .net.

  • public static void Frob<T>(T t) where T : UserQuery.IFrobber

    {

      t.Frob();

    }

    This should not involve boxing by definition. At run time it is calling t of a concrete type rather than an interface, because generic functions will be instanced for each value type during JIT.

Page 2 of 4 (59 items) 1234