The Truth About Value Types

The Truth About Value Types

Rate This
  • Comments 68

As you know if you've read this blog for a while, I'm disturbed by the myth that "value types go on the stack". Unfortunately, there are plenty of examples in our own documentation and in many books that reinforce this myth, either subtly or overtly. I'm opposed to it because:

  1. It is usually stated incorrectly: the statement should be "value types can be stored on the stack", instead of the more common "value types are always stored on the stack".
  2. It is almost always irrelevant. We've worked hard to make a managed environment where the distinctions between different kinds of storage are hidden from the user. Unlike some languages, in which you must know whether a particular storage is on the stack or the heap for correctness reasons.
  3. It is incomplete. What about references? References are neither value types nor instances of reference types, but they are values. They've got to be stored somewhere. Do they go on the stack or the heap? Why does no one ever talk about them? Just because they don't have a type in the C# type system is no reason to ignore them.

The way in the past I've usually pushed back on this myth is to say that the real statement should be "in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value."

The sheer number of weasel words in there is astounding, but they're all necessary:

  • Versions of C# provided by other vendors may choose other allocation strategies for their temporary variables; there is no language requirement that a data structure called "the stack" be used to store locals of value type.
  • We have many versions of the CLI that run on embedded systems, in web browsers, and so on. Some may run on exotic hardware. I have no idea what the memory allocation strategies of those versions of the CLI are. The hardware might not even have the concept of "the stack" for all I know. Or there could be multiple stacks per thread. Or everything could go on the heap.
  • Lambdas and anonymous methods hoist local variables to become heap-allocated fields; those are not on the stack anymore.
  • Iterator blocks in today's implementation of C# on the desktop CLR also hoist locals to become heap-allocated fields. They do not have to! We could have chosen to implement iterator blocks as coroutines running on a fiber with a dedicated stack. In that case, the locals of value type could go on the stack of the fiber.
  • People always seem to forget that there is more to memory management than "the stack" and "the heap". Registers are neither on the stack or the heap, and it is perfectly legal for a value type to go in a register if there is one of the right size. If if is important to know when something goes on the stack, then why isn't it important to know when it goes in a register? Conversely, if the register scheduling algorithm of the jit compiler is unimportant for most users to understand, then why isn't the stack allocation strategy also unimportant?

Having made these points many times in the last few years, I've realized that the fundamental problem is in the mistaken belief that the type system has anything whatsoever to do with the storage allocation strategy. It is simply false that the choice of whether to use the stack or the heap has anything fundamentally to do with the type of the thing being stored. The truth is: the choice of allocation mechanism has to do only with the known required lifetime of the storage.

Once you look at it that way then everything suddenly starts making much more sense. Let's break it down into some simple declarative sentences.

  • There are three kinds of values: (1) instances of value types, (2) instances of reference types, and (3) references. (Code in C# cannot manipulate instances of reference types directly; it always does so via a reference. In unsafe code, pointer types are treated like value types for the purposes of determining the storage requirements of their values.)
  • There exist "storage locations" which can store values.
  • Every value manipulated by a program is stored in some storage location.
  • Every reference (except the null reference) refers to a storage location.
  • Every storage location has a "lifetime". That is, a period of time in which the storage location's contents are valid.
  • The time between a start of execution of a particular method and the method returning normally or throwing an exception is the "activation period" of that method execution.
  • Code in a method can require the use of a storage location. If the required lifetime of the storage location is longer than the activation period of the current method execution then the storage is said to be "long lived". Otherwise it is "short lived". (Note that when method M calls method N, the use of the storage locations for the parameters passed to N and the value returned by N is required by M.)

Now we come to implementation details. In the Microsoft implementation of C# on the CLR:

  • There are three kinds of storage locations: stack locations, heap locations, and registers.
  • Long-lived storage locations are always heap locations.
  • Short-lived storage locations are always stack locations or registers.
  • There are some situations in which it is difficult for the compiler or runtime to determine whether a particular storage location is short-lived or long-lived. In those cases, the prudent decision is to treat them as long-lived. In particular, the storage locations of instances of reference types are always treated as though they are long-lived, even if they are provably short-lived. Therefore they always go on the heap.

And now things follow very naturally:

  • We see that references and instances of value types are essentially the same thing as far as their storage is concerned; they go on either the stack, in registers, or the heap depending on whether the storage of the value needs to be short-lived or long-lived.
  • It is frequently the case that array elements, fields of reference types, locals in an iterator block and closed-over locals of a lambda or anonymous method must live longer than the activation period of the method that first required the use of their storage. And even in the rare cases where their lifetimes are shorter than that of the activation of the method, it is difficult or impossible to write a compiler that knows that. Therefore we must be conservative: all of these storage locations go on the heap.
  • It is frequently the case that local variables and temporary values can be shown via compile-time analysis to be unused after the activation period ends, and therefore can be treated short-lived, and therefore can go onto the stack or put into registers.

Once you abandon entirely the crazy idea that the type of a value has anything whatsoever to do with the storage, it becomes much easier to reason about it. Of course, my point above stands: you don't need to reason about it unless you are writing unsafe code or doing some sort of heavy interoperating with unmanaged code. Let the compiler and the runtime manage the lifetime of your storage locations; that's what its good at.

 

  • I just wonder how important this (stack or no stack) could be for normal development?

  • what's with everyone's urge to be a 'lint' these days?  hey folks, 'stead of all the nit-picky comments why don't you go write your own abso-perfecto blog that no one will read?  or spend your days sending mail letters to the editors of your favourite newspapers correcting all the typos you find in the comic strips (or worse yet, in the obituaries)?  Jeez, get a life - read, absorb and move on, not ruminate, burp and regurgitate.

  • Excellent post and thread!

    @Marius Horak - "I just wonder how important this (stack or no stack) could be for normal development?"

    I thought it would be very important from a "performance" perspective - when I first looked at Java I was HORRIFIED ('Home Alone' pose) that I couldn't create user-defined types that would go on the stack, and felt C#'s struct feature was a major advantage.

    But I was totally wrong. The GC is so damn fast, it doesn't seem to make a lot of difference in real programs. I was being prejudiced by experience with C++, where the heap is much slower than the stack in all extant implementations.

    I think that prejudice is so strong, it infects a lot of discussion, "expert" opinion, training material etc. on the very subject discussed here, and so people obsess over a technical detail that is irrelevant 99% of the time.

    @Nick Aceves - "... Value Types represent something semantically very different than Reference Types. Value Types do not have Identity. Reference Types do."

    True, but the most basic and widely used "logical value type" in any platform is probably the string. Yet in the CLR platform, System.String is a class, not a struct.

    It uses the rich capabilities of classes to make every effort to ensure that the identity of its objects are irrelevant. The == operator is overridden to compare the string content, and strings are strictly immutable so we aren't going to suffer from accidental aliasing bugs. It simply doesn't matter how many names refer to the same string. It's a "value" in every way except it doesn't technically base itself on the value type facility in the CLR. It's not perfect - you can detect the identify using ReferenceEquals, or by casting strings to object: (object)s1 == (object)s2. But it works great most of the time.

    It's the same with Tuple in CLR 4 - and that even has a fixed storage size, so it technically could have been a struct. But it turned out that in typical real programs, all the copying was slower than using the GC to clear up garbage.

    The fact that things like String and Tuple are classes, not structs, is a big clue that structs are not a general basis for defining types that lack identity and represent "pure values". Structs don't even define == as a memberwise comparison by default. They seem to be most suited to a few interop scenarios. They're really a technical niche thing I think.

  • but if I give the complete answer, there won't be time to ask any other questions in the interview, man.

  • The myth is proved the belief that people had till day! - nice article.

    Like old say "Look and make sure what you see, and dont believe on rumors"

  • Awesome read! Thanks for sharing the thoughts

  • Eric, thanks for the interesting information :)

    Common question on the interviews about where are value types stored now becomes very interesting. :)

    Could you also introduce printer-friendly version of your blog?

  • Very useful and detailed analysis - thanks the author for that!

    Although I wouldn't be agree with the preceding comment that stands the change to "required by both M and N" is "just as correct": the actual scope of N concern includes allocation of its own automatic variables, referential types etc. that is obviously wider than the passed parameters and return value.

    Thus the original statement makes more sense for me...

  • So, just to clarify, when we allocate array of value types, it is allocated on heap, ok? But what does this array contain? Does it contain references to boxed values? Or does it contain value type values?

    Work it out from first principles. What is an array? An array is a collection of variables of a particular type, called the element type. What do we know about variables of value type? A variable of value type contains the value. (Unlike a variable of reference type, which contains a reference to the value.)

    Therefore there is no boxing; why would there be? An array of ints is a collection of variables of type int, not a collection of variables of type object.

    You seem to still be reasoning from the fallacy that "anything on the heap is always an object". That is completely false. What is true is that variables are storage locations, and that storage locations can be on the stack or the heap, depending on their known lifetimes. - Eric

    If it contains value type values, how does runtime know what type of values resides in the array?

    Well, how does the runtime know that a field of a class is of type int? A class (or struct) is a collection of variables (called fields); an array is a collection of variables (called elements). The runtime can somehow get type information from the object about what the type of one of its variables is. How it does so is an implementation detail. - Eric

  • @Dmitry

    The reason is because any reference to an element of that array must come from either:

    int[] ints = new[1000000];

    // compile time known

    // the compiler knows the types and anyway

    var x = ints[20];

    // esoteric: pointers in unsafe context. again compiler knows the type

    fixed (int* p = ints[20]&) {}

    // runtime known

    Array a = ints;

    object o = ints.GetValue(20); // here is the runtime checking

    int i = (int)o; // unboxing occurs, you must use int and not, say, long

    in that last example Array.GetValue requires an object return type but since it is an int array just 'grabbing' the value at offset IntPtr.Size from the start of the array's data section won't work.

    Instead it uses TypedReferences (which are basically two pointers, one to the value, one to the type it is) and the CLR supplies a function on Array to ensure that it gets the right pointer based on the type of the array, which is known because an array is an object, and just like all other objects it has a record in it's object header that contains a pointer to it's type (used for reflection, vtables and the like). This function is (as of 2.0)

    [MethodImpl(MethodImplOptions.InternalCall)]

    private extern unsafe void InternalGetReference(void* elemRef, int rank, int* pIndices);

    will have backing code which does the runtime type checking.

    The second function involved is on TypedReference:

    [MethodImpl(MethodImplOptions.InternalCall)]

    internal static extern unsafe object InternalToObject(void* value);

    This will do the job of boxing the resulting value as the right type (in this case a boxed int) rather than just passing it along as a reference if the array was, say string[].

    Therefore you can always get the appropriate type based on the array itself.

  • Thanks for an excellent article and excellent comments, including the debates. I know that some people get turned off by the criticisms in the comments, but honestly, it helps me to learn more about this to read a really well-thought out debate.

  • Great perspective, and a very helpful explanation -- But I thought it would be good to share the message that I take away from reading this article:

    From a game programmer's point of view, focusing on creating highest performance code, I should not use C# to produce the most efficient code because I have no way of telling where my data is being stored and what impact it will have on garbage collection performance. Instead, I should simply stick with a non-managed language to ensure predictable performance derived from known memory management.

    Haha, so that statement was quite extreme: I'm mostly saying this because I would love to hear these type of perspectives, like what was shown in this article, balanced with a bit more detail on the performance impacts of taking this point of view of the language.

    In reality, if I was programming for the Xbox 360, I would actually study the specific implementation details in order to create code that would be "performance friendly" for that specific platform, even though this might be against the "nature" and goals of the managed C# language.

  • I agree with Allen. It's all good and well to wish to be ignorant to the system's implementation, but anyone who has developed for the Xbox with XNA knows that it requires specific tuning. The fact that its garbage collector sucks causes all sorts of grief. Can we get some more explicit examples of reference and value types in relation to high performance (read: real-time systems) code, perhaps including talk of value types alongside the 'ref' parameter modifier?

  • Excellent post Eric, can you please shed some light on static variables, static methods, static classes storage.

  • There's one major difference between the semantics of ValueType and reference type instances which your article is missing: that (aside from ref/out arguments) when a value is moved from one expression to another, the value is copied. Thus manipulations of a ValueType instance through different expressions will not be visible to each other.

    Consider a class Foo with field int bar, and with a getter and setter of bar.  Foo a = new Foo(); Foo b = a; a.SetBar(5);  Console.Write(b.GetBar());  The setting of bar through a will be visible through b if and only if Foo is NOT a ValueType. This is an essential distinction, if not THE essential distinction of ValueTypes.

    Imagine implementing C# on a platform with no concept of ValueTypes (e.g. on a Java or Smalltalk VM). You could do it easily, but (aside from ref/out arguments), every assignment or return of a ValueType has a shallow-copy semantic, e.g. Foo a = b.MemberwiseClone();

Page 4 of 5 (68 items) 12345