The Stack Is An Implementation Detail, Part One

The Stack Is An Implementation Detail, Part One

Rate This
  • Comments 36

Stack<Stone>I blogged a while back about how “references” are often described as “addresses” when describing the semantics of the C# memory model. Though that’s arguably correct, it’s also arguably an implementation detail rather than an important eternal truth. Another memory-model implementation detail I often see presented as a fact is “value types are allocated on the stack”. I often see it because of course, that’s what our documentation says.

Almost every article I see that describes the difference between value types and reference types explains in (frequently incorrect) detail about what “the stack” is and how the major difference between value types and reference types is that value types go on the stack. I’m sure you can find dozens of examples by searching the web.

I find this characterization of a value type based on its implementation details rather than its observable characteristics to be both confusing and unfortunate. Surely the most relevant fact about value types is not the implementation detail of how they are allocated, but rather the by-design semantic meaning of “value type”, namely that they are always copied “by value”. If the relevant thing was their allocation details then we’d have called them “heap types” and “stack types”. But that’s not relevant most of the time. Most of the time the relevant thing is their copying and identity semantics.

I regret that the documentation does not focus on what is most relevant; by focusing on a largely irrelevant implementation detail, we enlarge the importance of that implementation detail and obscure the importance of what makes a value type semantically useful. I dearly wish that all those articles explaining what “the stack” is would instead spend time explaining what exactly “copied by value” means and how misunderstanding or misusing “copy by value” can cause bugs.

Of course, the simplistic statement I described is not even true. As the MSDN documentation correctly notes, value types are allocated on the stack sometimes. For example, the memory for an integer field in a class type is part of the class instance’s memory, which is allocated on the heap. A local variable is hoisted to be implemented as a field of a hidden class if the local is an outer variable used by an anonymous method(*) so again, the storage associated with that local variable will be on the heap if it is of value type.

But more generally, again we have an explanation that doesn’t actually explain anything. Leaving performance considerations aside, what possible difference does it make to the developer whether the CLR’s jitter happens to allocate memory for a particular local variable by adding some integer to the pointer that we call “the stack pointer” or adding the same integer to the pointer that we call “the top of the GC heap”? As long as the implementation maintains the semantics guaranteed by the specification, it can choose any strategy it likes for generating efficient code.

Heck, there’s no requirement that the operating system that the CLI is implemented on top of provide a per-thread one-meg array called “the stack”. That Windows typically does so, and that this one-meg array is an efficient place to store small amounts of short-lived data is great, but it’s not a requirement that an operating system provide such a structure, or that the jitter use it. The jitter could choose to put every local “on the heap” and live with the performance cost of doing so, as long as the value type semantics were maintained.

Even worse though is the frequently-seen characterization that value types are “small and fast” and reference types are “big and slow”. Indeed, value types that can be jitted to code that allocates off the stack are extremely fast to both allocate and deallocate. Large structures heap-allocated structures like arrays of value type are also pretty fast, particularly if you need them initialized to the default state of the value type. And there is some memory overhead to ref types. And there are some high-profile cases where value types give a big perf win. But in the vast majority of programs out there, local variable allocations and deallocations are not going to be the performance bottleneck.

Making the nano-optimization of making a type that really should be a ref type into a value type for a few nanoseconds of perf gain is probably not worth it. I would only be making that choice if profiling data showed that there was a large, real-world-customer-impacting performance problem directly mitigated by using value types. Absent such data, I’d always make the choice of value type vs reference type based on whether the type is semantically representing a value or semantically a reference to something.

UPDATE: Part two is here

*******

(*) Or in an iterator block.

  • I like this article.  I have been seen interview questions like what the difference between a value type and a reference type and this stack vs heap comes up as an answer a lot. The "copy by value" is a much more valuable concept to understand.  Pardon the pun.

  • > Structs in the CLI have the pleasant-for-the-implementor property that they are always of a size known at compile time. Strings and URIs do not have that property. So we can either (1) abandon the nice property that structs are of known size, or (2) make strings an immutable reference type and overload equality so that they act like value types.

    There's also (3), implement as an immutable reference type, but expose to API clients wrapped into a value type. I.e.:

    internal class StringData { ... }

    public struct String {

     private StringData data;

     public static bool operator== (String a, String b) { ... }

     ...

    }

  • | Leaving performance considerations aside,

    Wait WHAT?

    I always thought that performance considerations was the only reason to include user-defined value types in the language. Copy semantics are totally useless since we don't have any control over them, we can't implement our own copy/assignment constructors, nor guaranteed destructors (and I can't comprehend why). It's always bit-by-bit copy. Why would you need that, _leaving performance considerations aside_?

    So the only thing value types are useful for is performance. Array of structs is layed out contiguosly in memory, which means that when performance is an issue an you can convert some of your objects to structs, ditch `List<T>` in favour of Array.Resize (wrapped in a ref-parametered extension method that hopefully would be inlined sometimes) and get your 10x speed increase for that part of the code thanks to the prefetch.

    Oh, well, there is this p/invoke stuff, too. Got it almost right in 2.0, IIRC, with embeddable arrays.

  • I agree with John Skeet, where is the upvote button ;-).  

    "But more generally, again we have an explanation that doesn’t actually explain anything. Leaving performance considerations aside, what possible difference does it make to the developer whether the CLR’s jitter happens to allocate memory for a particular local variable by adding some integer to the pointer"

    Maybe to help the developer to avoid overusing value types in functions called recursively, to avoid hitting the one meg limit?  I know that is not much or a reason.

  • I mean. it looks like you've mistaken "value type semantics" for the "immutable type semantics"

        i = 10

        j = i

        j += 1

        assert i != j

    Is that what you meant? Well, firstly, Python lacks value types (and that was valid Python), even ints are heap-allocated, yet they are immutable, because immutability is a property of operations on a type. Second, C# value types are not immutable (except when stored in a `List<T>`, and that's why one has to use custom wrappers, sadly). No, these two things are profoundly different, and the value of the value types lies entirely in the performance characteristics.

    Or am I wrong?

  • KristophU - the beauty of managed lanagues with 'perfect'[1] GC is that *it doesn't matter* whether new puts the memory on the stack or on the heap (or in some thread local store, or never does it at all) so long as the code consuming the data gets its contract fulfilled all is well. The contracts will be things like:

    * Modifications to it are/aren't visible to other parts of code

    * The life span of the data is respected

    * data is available to multiple threads if needed

    By using techniques like escape analysis and whole program analysis you can see the obvious cases where the object is a candidate for stack allocation but even better you can see where the object never need be constructed (say if it is a POD block and only parts of it are required).

    This is just the same concept as copy-on-write verses copy on read, it's an implementation detail and one or the other is not always optimal so it's quite possible that in most cases the compiler/runtime can work this out better than most humans can.

    By giving the compiler and runtime more *freedom* you provide more scope for safe optimisations. This is why languages with free pointers are a bitch sometimes, pointer aliasing is *really* hard to work out correctly so effective optimisation can be a real pain, more programmer freedom actually hurts performance sometimes when the compiler is better at converting code to real world machine instructions.

    [1] in the sense that iff an object is no longer reachable then it is eligible collection as opposed to simplistic ref counting implementations and the like

  • @Name Required - a usage you've ignored

    structs allow exact control of the memory layout. This is extremely useful for simple (and admittedly quick) binary serialisation, interop with other code, if you want to implement unions (though you may consider that solely a performance issue).

    They also allow creation 'en masse' in a default, usable, well defined state (as opposed to null) (again this is also quick but the semantics of it are the key no need to mess about filling an array or list with an explicit constructor call).

    They can't (Nullable<T> not withstanding) be null. At all, ever. This makes programming with them miles easier. Though I admit I would have preferred that the compiler and runtime prevented *any* variable from being null unless it was marked with ?

    They always have a well defined default non referential notion of equality (even if it's currently slow)

    That's it off the top of my head...

  • Great article!

    For reference (haha, a pun), complex numbers using a value type for storage is probably the single most compelling example for value types in the context of performance. For example, computing the FFT of an array of complex numbers is ~5.5x faster with a value type than a reference type.

    Also, value types can be essential for interop, e.g. in XNA.

  • Reference types have inherent object identity. Even in Python, you can write:

    a = 1

    b = 2 - 1

    print (a is b)

    That last line is legal, but rather meaningless, since you don't get any useful information whatsoever from the fact that two ints are represented by the same object, or different ones.

    There's no formal definition of a value type (or at least I'm not aware of one), but a giveaway trait is that the only meaningful definition of identity of a value type is structural comparison (for some or all of its components), and, on the other hand, any value type can be meaningfully compared in that way. Reference types, on the other hand, may not have a meaningful equality operator at all (and defining one via referential identity is really just a hack to simplify things such as generic hash based containers).

  • I think one reason the stack/heap trope lingers is that stack vs heap is often part of the bumper-sticker explanation of boxing.  To wit: reference objects are "in the heap", so to be treated as a reference object, the value type must be copied there, which highlights the fact that it wasn't there to begin with.

    Furthermore, on the spectrum of .NET knowledge, boxing shades toward the arcane and spooky.  Arcane in that 90% of developers don't knowingly encounter it during 90% of their workday; spooky in that, among those that do think about it, most of the encounters are negative, i.e., cleaning up a bug due to accidental boxing, or at least including boxing in The List of Things That Could Go Subtly Wrong.

    The brain often transmutes arcane and spooky knowledge into some proxied form.  Mental categories for things-that-might-hurt are pretty powerful.  When trying to account for bizarre behavior, it's at least an interesting place to start.  Interesting enough for Freud to upend the study of psychology, anyway.

  • This is discussed at length in Domain Driven Design:

    http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215

    Here is my view.  From a semantic point of view it is Person is a reference type, Details is a value type:

    firstPerson = Person(name = "artsrc")

    secondPerson = Person(name = "artsrc")

    firstPerson != secondPerson

    firstDetails = Details(name = "artsrc")

    secondDetails= Details(name = "artsrc")

    firstDetails == secondDetails

    Two people with the same name are not the same person.  However their details may be the same if name is all that is known about them.

    Values can't change, new values are created.  Reference types can acquire new values over time.

    It makes no sense to do the assignment:

    1 = 2 + 3

    1 is a value.  It does make sense to say:

    firstPerson.age = 2+3

    It might be handy to say:

    firstDetails.age = 2 + 3

    but it confuses the role of details as a value.

  • I think 'passing' has become a horrible and confusing notion. When I think of passing an object, there should be one and only one object that moves from one conceptual place to another. Making copies of the object, destroying the object, or modifying the object can never be part of this. The object I put in and the object I get out should be identical in every observable way. (I say observable because implementation details do not matter.)

    Since the object passed in is identical to the object retrieved (there is only one object, because passing may not create objects), all modifications done afterward are visible to all parties.

    If for some reason we desire that an object is immutable, nothing else changes.

    If we desire that a mutable object is passed to a method that modifies it, without losing/modifying the original object, we copy the object before passing it. Since there is a specific reason why we need both versions of the object, copying the object should be explicit.

    These are the semantics we need. Now for structs. There are two kinds of structs, immutable and mutable. Immutable structs are semantically equivalent to immutable classes. Both kinds of objects are passed in the way I described: you put an object in and you get the same (that is, observably identical) object out. All modifications allowed to the object are visible to all parties, because no modifications are allowed. The only difference between immutable structs and immutable classes is an implementation detail that's only good for performance optimization. Like most performance optimizations, the choice would preferably be left to a smart compiler, although programmers would probably like it if they could specify it manually.

    Mutable structs are often considered evil, but people have good performance reasons to use them. For example, if the ability to efficiently mutate objects is required and the overhead of using addresses is undersired, like when manipulating a large number of graphical primitives. Conceptually though, we can easily replace mutable structs by immutable classes. Whenever a struct would have been modified, instead we conceptually create a new immutable object with the modifications in place, and replace the original object with it. Whenever a struct would have been passed, we pass the immutable object. Semantically, this is what makes sense to do.

    If we do this (when modifying objects) we lose the performance that changing fields has over creating new objects and replacing. But for every modification we immediately discard the original object, so we can optimize this creation-and-replacement by using field modifications as before. If this optimization is too difficult for a compiler to consider, we again have the programmer specify it manually. Passing the immutable objects of course can again be done by copying the bits as for integers, if this is worth it. So now we have objects with immutable class conceptualization, but mutable struct performance and implementation.

    I think passing-by-value has been incorrectly raised to the conceptual level. Aside from implementation performance reasons, I don't see why it would ever be sensible to implicitly copy an object in stead of passing-by-reference. As for the performance considerations, I think they should be handled by using programmatic annotations or automatic compiler optimizations, not by contriving a new category in the type system.

  • What's lost in the stack allocation discussion is that higher level program structure changes will often have much better performance results.

    Low level optimizations are typically done last in the development cycle and require the use of a code instruction level profiler.  It would help if the instruction level profiler was included in the basic or profressional Visual Studio.  The much more expensive Team Foundation System is beyond the reach of many development shops.

  • I would say that it is important to define Value Types vs Object types both in terms of how they behave AND in how they are implemented.

    If you don't realize that the stack, by it's very nature limits the number of value types you can declare within a function, you risk a StackOverflowException without knowing why or how to fix it, as today's post on my blog points out.

    If you don't realize that values are copied while objects are pointed to by a reference, you risk thinking that you've copied an object when in fact all you've done is moved a pointer around, and then you wonder why the original value got changed when you changed the copy.

    WHAT these types are doing in memory is in fact WHY a Value Type or an Object Type behaves the way it does.  Not that it has to be that way, but understanding the WHAT will take you a long way to understanding the behavior you can expect.

    It's not an "either/or" issue.  It's a "both/and" issue.

  • Stack vs. heap is still important to understand, for the simple reason that there are other languages out there that do things differently (notably C++).

    Obviously, items on the stack are gone when the current function returns; objects on the heap may live on.  In C++, a class can be instantiated on the stack, and things like CriticalSection classes depend on that behavior.  People coming from that background need to understand the very important difference in .NET.

    When .NET came out I thought this change was one of the more profound things they did.  In C++ the developer chose the lifetime semantics, regardless of type (struct or class).  In C#, the type determines the lifetime semantics.  Essential knowledge, even if "the stack" isn't actually on the CPU stack,

Page 2 of 3 (36 items) 123