Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
As you know if you've read this blog for a while, I'm disturbed by the myth that "value types go on the stack". Unfortunately, there are plenty of examples in our own documentation and in many books that reinforce this myth, either subtly or overtly. I'm opposed to it because:
The way in the past I've usually pushed back on this myth is to say that the real statement should be "in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value."
The sheer number of weasel words in there is astounding, but they're all necessary:
Having made these points many times in the last few years, I've realized that the fundamental problem is in the mistaken belief that the type system has anything whatsoever to do with the storage allocation strategy. It is simply false that the choice of whether to use the stack or the heap has anything fundamentally to do with the type of the thing being stored. The truth is: the choice of allocation mechanism has to do only with the known required lifetime of the storage.
Once you look at it that way then everything suddenly starts making much more sense. Let's break it down into some simple declarative sentences.
Now we come to implementation details. In the Microsoft implementation of C# on the CLR:
And now things follow very naturally:
Once you abandon entirely the crazy idea that the type of a value has anything whatsoever to do with the storage, it becomes much easier to reason about it. Of course, my point above stands: you don't need to reason about it unless you are writing unsafe code or doing some sort of heavy interoperating with unmanaged code. Let the compiler and the runtime manage the lifetime of your storage locations; that's what its good at.
Very interesting read... thanks for this.
A really useful and solid post that I can see being the target of links from StackOverflow for years to come :)
Tiny typo: "its" in the last sentence wants to be "they're", I think?
"There are three kinds of storage locations: stack locations, heap locations, and registers."
Not trying to be a pedant but just out of interest do you consider the compile time known strings (or indeed any other such 'baked in' reference types) to be "heap locations".
I assume fixed buffers within structs despite looking superficially like an array would be treated as not being reference types but instead simply a pointer to the interior of the struct and thus inherit their storage rules by whatever happens to their parent. (stackalloc buffers follow from your statements on pointers without any special cases)
Would you consider thread statics to be be considered (opaque) sugar around a stack location (even if just the threadid) and (possibly several) heap location(s).
"It is frequently the case that array elements, fields of reference types, locals in an iterator block and closed-over locals of a lambda or anonymous method must live longer than the activation period.....so must go to the heap"
Is the type the major driving factor in deciding what the lifetime of the value would be? How does the CLR decide what the required lifetime is? Based on the above statement, looks like this has been derived by observing how types are used. Is there a particular logic that the CLR follows to determine the lifetime along with just looking at the type?
As a followup to my question, can we game the system? meaning, can I include something in my program to make the CLR think a particular value goes to the heap instead of the stack/
I do agree with what you've put, but one issue I've found when trying to explain value & reference types is trying to get across the basic concept of 'value type' to someone with minimal knowledge of the CLR and .NET. Most programmers have some knowledge of what 'the stack' and 'the heap' are, what they're role is, and what they do, so although 'value types live on the stack', or 'value types you're manipulating go on the stack' are wrong, they are a variant of 'lies to children' - they explain the basic concept in a simplified way that is easy to understand. Later on do they learn all the caveats to what you've said, once they've understood the basic concept.
One similar example is in (British school) GCSE chemistry (about 14-15 years old), where you learn that there are two separate types of molecular bond - 'ionic' and 'covalent', with molecules either using one or the other depending on some very simple properties. Only later on in A-level (16-18 years old) do you learn that this is actually wrong - there is a continuum between ionic and covalent bonds (and I'm sure, in university chemistry courses, do you learn that that itself is a simplification). The difference between 'value types' and 'reference types' is similar.
So, although 'value types live on the stack' is wrong, it can be a useful first step to help someone fully understand what a value type is and how it behaves.
Why is Microsoft Press getting this wrong?
MCTS 70-536 from from Microsoft Press says in the first chapter, second line
"Value types are variables that contain their data directly instead of containing
a reference to the data stored elsewhere in memory. Instances of value types are
stored in an area of memory called the stack, where the runtime can create, read,
update, and remove them quickly with minimal overhead"
Well, what you're saying is perfectly correct. However from a developer perspective, the most important difference to know about value types and reference types is that value types get copied in the stack when they're passed as arguments. Knowing that the developer must avoid creating big value types that are copied inefficiently.
So from that perspective it's practical for developer to think that value types are stored in the stack.
That was really interesting and useful. Understanding storage allocation w.r.t. lifetime makes a lot more sense than mapping them to value/reference types as the later eventually leads to confusion.
Just wanted to clarify, an instance of a struct with a reference type as a field will be stored on the stack (in a typical situation, minus the exceptions) and the reference to the reference type will be stored on stack too. Is that accurate?
(Because that is what I understood after reading the three rules in your previous blog blogs.msdn.com/.../the-stack-is-an-implementation-detail-part-two.aspx)
"It is frequently the case that array elements, fields of reference types ...<snip>... Therefore we must be conservative: all of these storage locations go on the heap."
Does that mean if I have an array of ints in a local method, the ints in the array go on the heap?
@DaRage: Surely what's important is that the value is copied - not where it's copied to and from. If the value were being copied to the heap instead of to the stack, would that make it okay to have huge value types?
Learning about the copying behaviour of arguments (and simple assignments) is obviously important, but I think the detail of heap/stack allocation is a distraction there.
Excelent post, Eric!
By the way, I found your blog while searching for this subject (your older post).
I think most of this "local variables are allways stored on the stack" conviction comes from unmanaged world. One time, in an interview, one guy asked my about this, knowing that my primary programming language is C#.
I really never cared about this until this situation, just because I choosed for a managed language and I can live with the idea that CLR is there to choose a better way to JIT my code. When I saw your post about "stack/heap storage is an implementation detail", it sounded like music for me. It's nice to know implementation details and how you can use them to get more performance, but I really don't think you need to close your mind to one idea.
Recently I'm started working with C++ and unmanaged environment. And I'm currently observing some differente cultural aspects about the two worlds. Usually the C/C++ developers is more familiar with low-level programming. With drivers and embedded systems and applications more closer to the operational system. And they are really worried about how the assembly code is generated and it's performance impacts. You allways falls in discussions like "inline or not inline", "template or not template". No, no inheritance, because vtable function call indirections will get a performance trouble. All of them is good questions, but sometimes I really don't think the benefits pay the costs.
In other words. I think this phylosophyical questions about performance x portability x control is what make these cultural differences and still yet some resistance about managed environments. The "I allways need full control" x "I like building blocks" questions.
Anyway, thanks for the precious information and the great content in your blog.
Regards,
Eric Lemes
I agree with DaRage - it's not just about doing pointer arithmetic, it's also about not doing anything dumb. If it really doesn't matter, then why do Value Types exist at all? Why not just make everything a Class or a ensure that it's a long-lived heap variable, or better yet, let the compiler and runtime do what they're good at?
An interesting read, nonetheless.
I appreciate all the technical details and deep insight, I have learned a lot more about their behavior and relationship with the rest of the ecosystem, however, I'm still left with the question "what IS a value type", as in a single statement that begins with "a value type is"...
Very good post. I'd make a small change though. In the sentence:
(Note that when method M calls method N, the use of the storage locations for the parameters passed to N and the value returned by N is required by M.)
I'd change "required by M" to "required by both M and N". It's just as correct, but makes it just a little bit clearer because you don't have to think "Which was was M again?"