To box or not to box, that is the question

To box or not to box, that is the question

Rate This
  • Comments 59

Suppose you have an immutable value type that is also disposable. Perhaps it represents some sort of handle.

struct MyHandle : IDisposable
{
    public MyHandle(int handle) : this() { this.Handle = handle; }
    public int Handle { get; private set; }
    public void Dispose()
    {
        Somehow.Close(this.Handle);
    }
}

You might think hey, you know, I'll decrease my probability of closing the same handle twice by making the struct mutate itself on disposal!

public void Dispose()
{
    if (this.Handle != 0)
      Somehow.Close(this.Handle);
    this.Handle = 0;
}

This should already be raising red flags in your mind. We're mutating a value type, which we know is dangerous because value types are copied by value; you're mutating a variable, and different variables can hold copies of the same value. Mutating one variable does not mutate the others, any more than changing one variable that contains 12 changes every variable in the program that also contains 12. But let's go with it for now.

What does this do?

var m1 = new MyHandle(123);
try
{
  // do something
}
finally
{
    m1.Dispose();
}
// Sanity check
Debug.Assert(m1.Handle == 0);

Everything work out there?

Yep, we're good. m1 begins its life with Handle set to 123, and after the dispose it is set to zero.

How about this?

var m2 = new MyHandle(123);
try
{
  // do something
}
finally
{
    ((IDisposable)m2).Dispose();
}
// Sanity check
Debug.Assert(m2.Handle == 0);

Does that do the same thing? Surely casting an object to an interface it implements does nothing untoward, right?

.

.

.

.

.

.

.

.

Wrong. This boxes m2. Boxing makes a copy, and it is the copy which is disposed, and therefore the copy which is mutated. m2.Handle stays set to 123.

So what does this do, and why?

var m3 = new MyHandle(123);
using(m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

.

.

.

.

.

.

.

.

.

.

 

Based on the previous example you probably think that this boxes m3, mutates the box, and therefore the assertion fires, right?

Right?

Is that what you thought?

You'd be perfectly justified in thinking that there is a boxing performed in the finally because that's what the spec says. The spec says that the "using" statement's expansion when the expression is a non-nullable value type is

finally
{
  ((IDisposable)resource).Dispose();
}

However, I'm here today to tell you that the disposed resource is in fact not boxed in our implementation of C#. The compiler has an optimization: if it detects that the Dispose method is exposed directly on the value type then it effectively generates a call to

finally
{
  resource.Dispose();
}

without the cast, and therefore without boxing.

Now that you know that, would you like to change your answer? Does the assertion fire? Why or why not?

Give it some careful thought.

.

.

.

.

.

.

.

.

.

.

The assertion still fires, even though there is no boxing. The relevant line of the spec is not the one that says that there's a boxing cast; that's a red herring. The relevant bit of the spec is:

A using statement of the form "using (ResourceType resource = expression) statement" corresponds to one of three possible expansions. [...] A using statement of the form "using (expression) statement" has the same three possible expansions, but in this case ResourceType is implicitly the compile-time type of the expression, and the resource variable is inaccessible in, and invisible to, the embedded statement.

That is to say, our program fragment is equivalent to:

var m3 = new MyHandle(123);
using(MyHandle invisible = m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

which is equivalent to

var m3 = new MyHandle(123);
{
  MyHandle invisible = m3;
  try
  {
    // Do something
  }
  finally
  {
    invisible.Dispose(); // No boxing, due to optimization
  }
}
// Sanity check
Debug.Assert(m3.Handle == 0);

It is the invisible copy which is disposed and mutated, not m3.

And that's why the compiler can get away with not boxing in the finally. The thing that it is not boxing is invisible and inaccessible and therefore there is no way to observe that the boxing was skipped.

Once again the moral of the story is: mutable value types are enough pure evil to turn you all into hermit crabs, and therefore should be avoided.

  • @Simon: Nullable<T>.Value is a read-only property and the type is immutable. Similar to string a = 'a' does not mean to alter a character of the object but to assign the variable a with a new object reference.

    If it was reference type and it translated to a.Value = 3, then int? a = null; a = 3; will throw NullReferenceException.

  • 1. In the language specs I read "In either expansion, the resource variable is read-only in the embedded statement, and the d variable is inaccessible in, and invisible to, the embedded statement.". Whereby the "d variable" only exists in the expansion for dynamic types. There exists an explicit expansion for value types, therefore the dynamic case is surely not appropriate here. Thus I would not have expected, that a shadow variable is created.

    2. The expansion for value types states: "((IDisposable)resource).Dispose();". Also we can read: "An implementation is permitted to implement a given using-statement differently, e.g. for performance reasons, as long as the behavior is consistent with the above expansion.". But I think, optimizing the cast away, offends the requirement of a consistent behavior, since casting introduces a boxing, which leads to a different behavior.

  • >> But I think, optimizing the cast away, offends the requirement of a consistent behavior, since casting introduces a boxing, which leads to a different behavior.

    It's not different if you can't observe it (the "as if" rule). Speaking of which, can anyone come up with some way of observing boxing (or lack thereof) during the call to Dispose in "using" (while remaining within the realm of language spec, obviously).

  • Pavel: The usual way to observe boxing is to run a function under a memory profiler. If you see allocations for a value type, it's a box (or an array). That's how you can tell that comparing an unconstrained generic value to null is boxing when the generic type is Nullable<>, for example.

  • @Gabe: that is precisely while I said "remaining within the realm of language spec". The language spec does not define the effect a program will have on a profiler - thus, whether you see boxing there or not, it does not affect the conformity of a particular C# implementation. To prove that this is non-conforming behavior, you'd have to devise some scheme to observe the boxing behavior from within the program itself.

  • @Pavel: An edge case, and I'm sure the spec does not ensure fixed (this) is stable for local variables (that are not in an iterator or async method and are not captured by a lambda and therefore on the stack), so it may not be "within the language spec" to the letter, but I think this pattern is somewhat useful, for eeking out some more perf:

    unsafe struct PtrLock : IDisposable

    {

       static HashSet<PtrLock*> locked = new HashSet<PtrLock*>();

       public static DoSomethingToLocked() { ... }

       bool isLocked;

       public void Lock() { if (!isLocked) { isLocked = true; fixed (PtrLock* ptr = this) locked.Add(ptr); } }

       void IDisposable.Dispose() { if (isLocked) { isLocked = false; fixed (PtrLock* ptr = this) locked.Remove(ptr); } }

    }

  • >> I'm sure the spec does not ensure fixed (this) is stable for local variables

    Actually it does (assuming that you meant to write "fixed(&this)"). Quote:

    "Fixed variables reside in storage locations that are unaffected by operation of the garbage collector. (Examples of fixed variables include local variables, value parameters, and variables created by dereferencing pointers.) On the other hand, moveable variables reside in storage locations that are subject to relocation or disposal by the garbage collector. (Examples of moveable variables include fields in objects and elements of arrays.)."

    so your code would indeed show the effect of boxing or lack of boxing.

    The reason why I thought it still wouldn't work is because the using-variable is readonly according to the spec. I always assumed that this carries with it the same set of restrictions as you have for a readonly field. Sure enough, you can't assign to a using-variable, you can't take its address with unary &, and you can't pass it as ref/out.

    But there is one big difference. For a readonly field, when a method is called on it, a temporary copy is created for that call, so that any mutating method would mutate that copy and not touch the original field. If that were the case for using-variable also, then &this inside the method would give you the address of the temporary, which would be expected to be different from what you get as &this inside Dispose(). But that isn't what happens - when a method is called on a using-variable, despite it being "readonly", the method is called directly and can mutate it! This also means that &this is guaranteed to be the address of the actual variable, and not the copy.

    On the other hand, one could argue that lifetime of using-variable is already over by the time Dispose() is called (it's over immediately after the boxing cast to IDisposable that the spec mandates), and so its memory location could be "reused" for boxing by the implementation. In other words, getting the same address inside Dispose() as in other methods is not by itself enough to say that implementation did not box, as the spec does not require the box to have a distinct address (it would require that if the box coexisted with the variable in an observable way, but it doesn't).

  • @Bill P. Godfrey "mutable value types could be an argument for adding support for C++ style copy constructors and out-of-scope destructors for C# value types."

    AAAIEEEE! Run for the hills! :)

    It's the other way round, surely. The problems caused by mutable value types form a very strong argument for not creating mutable value types, and would not be solved by those features anyway.

    The question is, why have structs at all? Obviously we're stuck with them now, but if they didn't exist would they need to be invented? The answer is no.

    It's common for C++ users to assume that GC-ed (heap allocated) reference types will perform much more poorly than stack-allocated value types, but this is based on their experience with C++ native heaps, which are non-compacting and so perform very poorly. This may be the reason why custom value types were added to the CLR in the first place.

    But the reality in the present-day CLR is quite different - the CLR GC heap is so blazingly fast, performance is hardly ever a reason to choose between 'struct' and 'class'. See: msdn.microsoft.com/.../dd942829.aspx

    So you don't get a performance boost from struct over class, and yet you have to give up so much in terms of language features. It's a very poor trade.

    And are copy constructors really something to pine over? If something is immutable, there is never a reason to make an exact copy of it. The original will do just fine - it's never going to change.

    In CLR/Java, if you find yourself thinking "Dang, I wish I could write a copy constructor for this class", try making it immutable instead, and watch your troubles evaporate! You will be able to treat references to it as if they were values (as long as you override ==/!= appropriately).

    Strings are the really telling example here. By making them immutable GC-ed objects, runtimes like the CLR and the JVM actually provide measurably better performance than Standard C++ programs using appropriately-designed classes with copy constructors, i.e. the std:: stuff. See the famous showdown: blogs.msdn.com/.../416151.aspx

  • Daniel: I find it ironic that you used a Rico Mariani blog post to support your point that structs are unnecessary, while I used one (blogs.msdn.com/.../733887.aspx) to support my point that they *are* necessary.

    In fact, if they didn't exist, somebody would have probably had to invent them. That's why NumPy exists -- Python doesn't have low-overhead types like struct so somebody had to create them in a C module.

    If you look at Rico's MeshSection example, you'll see that Point3d is 24 bytes, Point2d is 16 bytes, Vector3d is 24 bytes, Vertex is 64 bytes, and Quad is 16 bytes. Allocating a MeshSection allocates exactly 3 objects (the MeshSection, the Vertex array, the Quad array, and ignoring the TextureMap). Creating a MeshSection with 1k vertices and quads will use 64k bytes for the Vertex[], 16k bytes for the Quad[], and maybe 64 bytes of overhead, for a total of 80k. Accessing any element of either array (even vertices[123].normal.dx) requires finding the start of the array, multiplying the size of each element by the index, and adding in the offset to the field. GC overhead is negligible because there are only 3 objects.

    If this had to be done with only reference types (on a 32-bit machine where each object has 8 bytes overhead), Point3d is 32 bytes, Point2d is 24 bytes, Vector3d is 32 bytes, Vertex is 20 bytes, and Quad is 24 bytes. Allocating a MeshSection with N vertices and quads allocates 3 + 5N allocations (5k in this case). The memory used is 64 +4N (array of references to Vertex) + 4N (array of references to Quad) + 20N (instances of Vertex) + 32N (instances of Point3d) + 24N (instances of Point2d) + 32N (instances of Vector3d) + 24N (instances of Quad), for a total of 140k. Now to access vertices[123].normal.dx you have to find the offset into the Vertex[], dereference the Vertex, dereference the Vector3d, and find the offset to the field.

    So for this example, using only reference types nearly doubles the amount of memory used, makes accessing fields several times more work, and turns the memory management (allocation and GC) overhead from being constant to being linear. I would argue that certainly the combination of all of these is reason enough to implement value types in .Net.

  • Implementing certain high performance, latency sensitive applications in Java is a nightmare, allocations might be cheap but collections can be expensive, the less you allocate the less frequently you have to collect. Gen0 isn't too bad but you get a steady drain into gen1 as a result and if that gets fast enough you can trigger gen2 ever few seconds. Lest you think this is a premature optimisation and that things that are stack bound in terms of lifespan shouldn't be too bad a missed logging statement concatenating a string and an int on the hot path is enough to cause a runaway inability to keep up if it's on the hot path.

    Using structs here becomes a necessity, even the escape analysis in the current java runtimes isn't enough in many cases, it's just too conservative.

    Certainly many people need not care about these things, but there are people using .net because they get the safety of managed code with the freedom to take responsibility for value types, pointers (very useful for fast serialisation), union types, stackalloc, et al when they really need to. That there are people who use those when they don't need to doesn't bother me one whit so long as they don't go near my code base.

    This all goes double for the compact framework or XNA on the xbox where the generational GC disappears and collections become something to either avoid, or force your design to allocate slugs of easily traversable low reference plain old data objects so that collection is cheap.

  • I would write a reply to @Daniel, but @Gabe and @Shuggy covered everything. Well, I should add that the object header also add some number of bytes close to the size of those structs (8 or 16 bytes, on 32bit I think?) which makes the memory usage much worse, not to mention if the allocation of all those objects is not all together (unlikely in the context of an model editor, say) that any processing will be all over the place in memory access, which will be the reall killer as your 30000 vertices need 3000 page acesses rather than 60 (and each extra page pushes another page out, remember!).

  • I think those are all great examples of how micro-optimisation can make a huge difference in specific cases, so they're perfectly valid examples, but... they could have been solved in a library written in C or C++. Exactly as they are in Python! (I find it ironic that you'd invoke a successful *library* solution as indicating the need for a *language* feature).

    An alternative-history C# that never had structs would not have been significantly worse as an application for developing real-world applications, and would still have been great at *consuming* high-performance native libraries.

  • "and would still have been great at *consuming* high-performance native libraries."

    No, it wouldn't. Structs with explicit memory layout are pretty much a requirement for several of the libraries I use.

    Also the moment I introduce a C++ boundary I am no longer able to use all those nice things like closures and the like across that boundary. Also if the value type usage is pervasive within the unmanaged API then it may force an overly chatty API (or force far more of the design down into the unmanaged code losing me all those nice features I want to have access to).

    Wanting to create to create a composite container of plain old data without a heap allocation is a reasonable desire for anyone writing high performance code in an environment where you cannot simply change the memory management routines out underneath it. Sometimes human's really can beat the machines.

  • Consuming high-performance native libraries is great work, if you can get it. Of course, one of the reasons to use managed code is that it runs in sandboxed environments like the web browser, mobile phones, XBox, shared hosting (SQL Server, ASP.NET), and Internet ClickOnce apps. Since native libraries are of no use to authors of apps in those environments, it's a good thing that C# can be used to write high-performance libraries for them.

    The more you can do within a language, the more benefits you get from composition. That's why LINQ is so great: instead of falling off a cliff into SQL-land any time you want to write a query, or being stuck writing loops manually, you can just write all of your queries in C# and let the environment handle the rest. C# makes it easy to compose queries with graphics libraries and math libraries. For example, let's say I want to know what gamma value produces the most evenly-distributed histogram for each of a set of images. I can easily write a LINQ query to take a table of gamma values, pass them to a graphics library to do gamma correction, pass the resulting image to a math package to make a histogram, and then write my own function to analyze the histogram.

    In the alternative-history C# where queries can only be run against DB libraries, graphics can only be done with graphics libraries, and math can only be done with math libraries, composition is impossible and you're stuck writing it all yourself.

Page 4 of 4 (59 items) 1234