The Truth About Value Types

The Truth About Value Types

Rate This
  • Comments 68

As you know if you've read this blog for a while, I'm disturbed by the myth that "value types go on the stack". Unfortunately, there are plenty of examples in our own documentation and in many books that reinforce this myth, either subtly or overtly. I'm opposed to it because:

  1. It is usually stated incorrectly: the statement should be "value types can be stored on the stack", instead of the more common "value types are always stored on the stack".
  2. It is almost always irrelevant. We've worked hard to make a managed environment where the distinctions between different kinds of storage are hidden from the user. Unlike some languages, in which you must know whether a particular storage is on the stack or the heap for correctness reasons.
  3. It is incomplete. What about references? References are neither value types nor instances of reference types, but they are values. They've got to be stored somewhere. Do they go on the stack or the heap? Why does no one ever talk about them? Just because they don't have a type in the C# type system is no reason to ignore them.

The way in the past I've usually pushed back on this myth is to say that the real statement should be "in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value."

The sheer number of weasel words in there is astounding, but they're all necessary:

  • Versions of C# provided by other vendors may choose other allocation strategies for their temporary variables; there is no language requirement that a data structure called "the stack" be used to store locals of value type.
  • We have many versions of the CLI that run on embedded systems, in web browsers, and so on. Some may run on exotic hardware. I have no idea what the memory allocation strategies of those versions of the CLI are. The hardware might not even have the concept of "the stack" for all I know. Or there could be multiple stacks per thread. Or everything could go on the heap.
  • Lambdas and anonymous methods hoist local variables to become heap-allocated fields; those are not on the stack anymore.
  • Iterator blocks in today's implementation of C# on the desktop CLR also hoist locals to become heap-allocated fields. They do not have to! We could have chosen to implement iterator blocks as coroutines running on a fiber with a dedicated stack. In that case, the locals of value type could go on the stack of the fiber.
  • People always seem to forget that there is more to memory management than "the stack" and "the heap". Registers are neither on the stack or the heap, and it is perfectly legal for a value type to go in a register if there is one of the right size. If if is important to know when something goes on the stack, then why isn't it important to know when it goes in a register? Conversely, if the register scheduling algorithm of the jit compiler is unimportant for most users to understand, then why isn't the stack allocation strategy also unimportant?

Having made these points many times in the last few years, I've realized that the fundamental problem is in the mistaken belief that the type system has anything whatsoever to do with the storage allocation strategy. It is simply false that the choice of whether to use the stack or the heap has anything fundamentally to do with the type of the thing being stored. The truth is: the choice of allocation mechanism has to do only with the known required lifetime of the storage.

Once you look at it that way then everything suddenly starts making much more sense. Let's break it down into some simple declarative sentences.

  • There are three kinds of values: (1) instances of value types, (2) instances of reference types, and (3) references. (Code in C# cannot manipulate instances of reference types directly; it always does so via a reference. In unsafe code, pointer types are treated like value types for the purposes of determining the storage requirements of their values.)
  • There exist "storage locations" which can store values.
  • Every value manipulated by a program is stored in some storage location.
  • Every reference (except the null reference) refers to a storage location.
  • Every storage location has a "lifetime". That is, a period of time in which the storage location's contents are valid.
  • The time between a start of execution of a particular method and the method returning normally or throwing an exception is the "activation period" of that method execution.
  • Code in a method can require the use of a storage location. If the required lifetime of the storage location is longer than the activation period of the current method execution then the storage is said to be "long lived". Otherwise it is "short lived". (Note that when method M calls method N, the use of the storage locations for the parameters passed to N and the value returned by N is required by M.)

Now we come to implementation details. In the Microsoft implementation of C# on the CLR:

  • There are three kinds of storage locations: stack locations, heap locations, and registers.
  • Long-lived storage locations are always heap locations.
  • Short-lived storage locations are always stack locations or registers.
  • There are some situations in which it is difficult for the compiler or runtime to determine whether a particular storage location is short-lived or long-lived. In those cases, the prudent decision is to treat them as long-lived. In particular, the storage locations of instances of reference types are always treated as though they are long-lived, even if they are provably short-lived. Therefore they always go on the heap.

And now things follow very naturally:

  • We see that references and instances of value types are essentially the same thing as far as their storage is concerned; they go on either the stack, in registers, or the heap depending on whether the storage of the value needs to be short-lived or long-lived.
  • It is frequently the case that array elements, fields of reference types, locals in an iterator block and closed-over locals of a lambda or anonymous method must live longer than the activation period of the method that first required the use of their storage. And even in the rare cases where their lifetimes are shorter than that of the activation of the method, it is difficult or impossible to write a compiler that knows that. Therefore we must be conservative: all of these storage locations go on the heap.
  • It is frequently the case that local variables and temporary values can be shown via compile-time analysis to be unused after the activation period ends, and therefore can be treated short-lived, and therefore can go onto the stack or put into registers.

Once you abandon entirely the crazy idea that the type of a value has anything whatsoever to do with the storage, it becomes much easier to reason about it. Of course, my point above stands: you don't need to reason about it unless you are writing unsafe code or doing some sort of heavy interoperating with unmanaged code. Let the compiler and the runtime manage the lifetime of your storage locations; that's what its good at.

 

  • CunningDave: "then why do Value Types exist at all?  Why not just make everything a Class or a ensure that it's a long-lived heap variable, or better yet, let the compiler and runtime do what they're good at?"

    Because Value Types represent something semantically very different than Reference Types. Value Types do not have Identity. Reference Types do.

  • @Eric. I've reread your post and realised that you did define "short-lived" and "long-lived" quite precisely. I missed that initially - sorry. The term "dynamic extent" is not new of course. The site at www.memorymanagement.org contains some useful definitions. In particular:

    dynamic extent - An object has dynamic extent if its lifetime is bounded by the execution of a function or some other block construct.

    indefinite extent - An object has indefinite extent if its lifetime is independent of the block or function-call structure of the program.

    stack allocation - Stack allocation means run-time allocation and deallocation of storage in last-in/first-out order.

    heap allocation (also known as dynamic allocation) - Heap allocation or dynamic allocation means run-time allocation and deallocation of storage in arbitrary order.

  • Is it safe to say that reference types always go on the heap?

  • Vince: it is safe to say that, in current Microsoft .NET implementation of CLR, storage referenced by values of reference types is always allocated on the heap.

  • In case we didn't already have enough confusion between the x86 stack (PUSH and POP instructions and the ESP register), the stack of nested method activations (which as Eric points out doesn't actually have to use the x86 stack), the register file (locations in which become associated and dissociated with formal x86 registers such that even local variables which aren't assigned to registers by the JIT might in fact be held in the register file), and Stack<T> instances, let me remind you about this thing called an "MSIL operand stack" which IS part of the .NET specification.

    While there may be no requirement that a data structure called "the stack" be used to store local variables, in MSIL all method parameters are required to be placed onto "the operand stack", and of course that says absolutely nothing about the machine code finally generated by the JIT, which could even (during inlining) eliminate parameters which do not vary at runtime and *not store the parameters anywhere*.

  • Eric, It would be interesting to see your response to an "inversion" of the original question...."What would influence your choice of class vs. struct for various items in a high performance / low latency application which used massive amounts of data?"

  • I've been thinking for a long time about what it would take to make value types that I, personally, don't find confusing. For example, int, bool, and enum are value types and I have never had a problem understanding their semantics. I believe that what's necessary for me to not find them confusing is immutability.

    I realize there are good reasons in some performance-critical scenarios to have mutable value types, so of course I wouldn't want to get rid of the current capability to do that. But I think it'd be handy to have some syntactic sugar for immutable value types. I was reminded of this again today when I discovered the limitation that you can't implicitly reference "this" in a query expression in a value type, and realized that thanks to your blog I do understand why that's necessary but that it wouldn't be if you could declare that the struct is immutable.

    My proposal would be to allow "readonly" as a modifier on struct types. A readonly struct would have the following characteristics:

    The "readonly" modifier would be implied on all fields

    "private set" would be required on all automatic properties, and those properties could only be set from the constructor (or, alternatively, automatic properties would be forbidden)

    No other property setters would be permitted

    All fields (including automatic property backing fields) would be required to be of other readonly struct types (with bool, enum, char and all the integer types grandfathered in as being considered readonly, and Nullable<T> considered readonly if T is).

    Every instance method that references "this", explicitly or implicitly, would be translated under the hood by the compiler into a private static method that takes _this as a parameter (a value parameter, not a ref or out parameter). The explicit and implicit references to "this" within the method body would be translated to refer to _this instead. The instance method would simply call the static one.

    Any thoughts? Did I leave any loopholes that would permit mutation, other than reflection?

  • I wish I'd see this post a few days back...  someone should have sent petebu off to learn about Scheme's call/cc.   There's always a stack, and functions lifetimes are LIFO?   Pfft.

  • Jon, coming at this from the other direction -- how then would you describe the different between value and reference types?

  • Types may not mandate specific storage, but they do impose constraints on what storage strategies makes sense to implement, because type defines a set of operations on it's values and their semantics. Reference types in C# have 2 properties

    - Mutable values with changes visible via all variables pointing to the object

    - Reference equality operation (do these 2 variables point to the same object?)

    Without these operations one could consider some copying strategies -- e.g. allocate all objects on the stack, then migrate to heap objects references to which were stored in longer lived objects. (Also migrate heap objects to thread-local heaps, node-local heaps in distributed systems etc). "Migration" could be done by simple copying. The 2 constraints I mentioned above make such strategies either not efficient (need to update all references to point to the new copy -- mini-gc) or not usable in general case (JVMs do escape analysis and stack allocate objects which are provably not leaked outside).

    Semantics of value types in C# was carefully crafted to allow straightforward allocation of their values on the stack.

  • I had an interview question on this recently, some blah blah about where value types and reference types are stored.

    I thought that the answer "Who cares?" is probably not what they were looking for so give the usual, and aparently wrong, schpiel.

    I didn't get the job but I wish I had their email addresses to send them this link.

  • @Brian, imagine for a second they read this blog and therefore wanted you to say "Who cares? It's an implementation detail!"

  • Why have value types at all? What's the point?

  • JeffC: Imagine that you have a bitmap manipulation class. It keeps an array of Pixel instances, where each Pixel contains an Opacity instance and 3 Color instances:

    struct Opacity { byte Value; }

    struct Color { byte Value; }

    struct Pixel { Opacity Opacity; Color Red, Green, Blue; }

    Since those are all value types, the image from my 12 megapixel camera will take 12M * 4, or 48MB. Each pixel will be stored adjacent to its neighbor in memory and the values are easy to access.

    If those were reference types, creating the array of 12M of them would allocate 48MB of memory just for references to Pixels (96MB on a 64-bit machine). Then you would have to loop through all 12M of the array elements to create 12M Pixel instances, each with at least 20 bytes (4 references plus overhead, on a 32-bit machine), for an additional 240MB. Of course for each pixel, you need to create 3 Colors and an Opacity, each being at least 8 bytes, adding 384MB more. So now your pixel array takes up 670MB instead of 48MB, the pixels are nowhere near each other in RAM (causing lots of page faults and cache misses), and you have to follow two references to get each value.

    Now do you see why we have value types?

  • very good post.

    n thanx for awareness about it.

Page 3 of 5 (68 items) 12345