The Stack Is An Implementation Detail, Part One

The Stack Is An Implementation Detail, Part One

Rate This
  • Comments 36

Stack<Stone>I blogged a while back about how “references” are often described as “addresses” when describing the semantics of the C# memory model. Though that’s arguably correct, it’s also arguably an implementation detail rather than an important eternal truth. Another memory-model implementation detail I often see presented as a fact is “value types are allocated on the stack”. I often see it because of course, that’s what our documentation says.

Almost every article I see that describes the difference between value types and reference types explains in (frequently incorrect) detail about what “the stack” is and how the major difference between value types and reference types is that value types go on the stack. I’m sure you can find dozens of examples by searching the web.

I find this characterization of a value type based on its implementation details rather than its observable characteristics to be both confusing and unfortunate. Surely the most relevant fact about value types is not the implementation detail of how they are allocated, but rather the by-design semantic meaning of “value type”, namely that they are always copied “by value”. If the relevant thing was their allocation details then we’d have called them “heap types” and “stack types”. But that’s not relevant most of the time. Most of the time the relevant thing is their copying and identity semantics.

I regret that the documentation does not focus on what is most relevant; by focusing on a largely irrelevant implementation detail, we enlarge the importance of that implementation detail and obscure the importance of what makes a value type semantically useful. I dearly wish that all those articles explaining what “the stack” is would instead spend time explaining what exactly “copied by value” means and how misunderstanding or misusing “copy by value” can cause bugs.

Of course, the simplistic statement I described is not even true. As the MSDN documentation correctly notes, value types are allocated on the stack sometimes. For example, the memory for an integer field in a class type is part of the class instance’s memory, which is allocated on the heap. A local variable is hoisted to be implemented as a field of a hidden class if the local is an outer variable used by an anonymous method(*) so again, the storage associated with that local variable will be on the heap if it is of value type.

But more generally, again we have an explanation that doesn’t actually explain anything. Leaving performance considerations aside, what possible difference does it make to the developer whether the CLR’s jitter happens to allocate memory for a particular local variable by adding some integer to the pointer that we call “the stack pointer” or adding the same integer to the pointer that we call “the top of the GC heap”? As long as the implementation maintains the semantics guaranteed by the specification, it can choose any strategy it likes for generating efficient code.

Heck, there’s no requirement that the operating system that the CLI is implemented on top of provide a per-thread one-meg array called “the stack”. That Windows typically does so, and that this one-meg array is an efficient place to store small amounts of short-lived data is great, but it’s not a requirement that an operating system provide such a structure, or that the jitter use it. The jitter could choose to put every local “on the heap” and live with the performance cost of doing so, as long as the value type semantics were maintained.

Even worse though is the frequently-seen characterization that value types are “small and fast” and reference types are “big and slow”. Indeed, value types that can be jitted to code that allocates off the stack are extremely fast to both allocate and deallocate. Large structures heap-allocated structures like arrays of value type are also pretty fast, particularly if you need them initialized to the default state of the value type. And there is some memory overhead to ref types. And there are some high-profile cases where value types give a big perf win. But in the vast majority of programs out there, local variable allocations and deallocations are not going to be the performance bottleneck.

Making the nano-optimization of making a type that really should be a ref type into a value type for a few nanoseconds of perf gain is probably not worth it. I would only be making that choice if profiling data showed that there was a large, real-world-customer-impacting performance problem directly mitigated by using value types. Absent such data, I’d always make the choice of value type vs reference type based on whether the type is semantically representing a value or semantically a reference to something.

UPDATE: Part two is here

*******

(*) Or in an iterator block.

  • Excellent article - you're totally correct - the emphasis on value types stored on the stack is bizarre and confusing.

  • So... an improvement to the .NET documentation and its explorer could be to separate the semantic usage from the implementation details (show just when needed).

  • Basically I think the stuff about allocating structs on the stack is about a getting a simple mental model of the memory management of the CLR rather than an correct explanation of it:

    When you look at the memory management of C you will realize that it is quite simple to to understand

    and you can get an image how this actually maps to the execution on a CPU (+ RAM). So it's quite convient to map the primitives found in C (stack allocation, heap allocation) to C# and while this doesn't necessarily explains that much it still gives some intuitive image what happens behind the scenes (which means behind the abstractions provided by C#/the CLR in terms of memory managment).

    So, you are right we get an model which doesn't explain that much and might be even wrong (e.g. your idea about allocating structs on the heap). But nonetheless there is no simple(*) mental model (except 'magic') for the memory management of the CLR and a simple image which may be wrong under certain circumstances is often better than no image at all.

    (*) With simple I mean really simple: A compacting, mark-and-sweep, generational garbage collector is not simple.

  • "...a simple image which may be wrong under certain circumstances is often better than no image at all."

    Unless you're making technical decisions based on that image. And most people do; for many, that's the entire purpose of having a mental image: to make decisions easier.

    One of the purposes of managed code is to make implementation details a much lower priority for developers than they used to be. It's reasonable, in many cases, to simply say that what's going on behind the scenes just doesn't matter. In some cases, it's even a requirement; different implementations of the CLR do things differently, and moving code (or even algorithms) between runtimes can be a wake-up call if you're depending on some implementation detail. And there are more implementations out there than you might think.

    If knowing the implementation details is something of a hobby for you, then by all means enjoy. If you've identified an implementation-dependent bit of code as a major bottleneck in your application, then feel free to fit it precisely to a specific implementation. But for the vast majority of the managed code that most people write, they should keep the implementation details pretty far from their minds and stick to the behavioural and semantic details -- the ones that are essentially guaranteed by the specifications.

  • I'm sure I'm not the only reader looking for the upvote button at this point...

    Having said that, the stack/heap problem doesn't feel as bad to me as the "reference by value" vs "object by reference" issue, where developers *still* insist that "objects are passed by reference by default" (in C#). Avoiding using the word "reference" for two definitely-related-by-distinct concepts would have been handy - but arguably this is computer science's fault more than C#'s.

    Jon

  • - Optimize after you solve and implement the business solution.  

    - Pick the technology solution so that it fits the business problem (don't force the business solution to fit a particular technology, architecture, design pattern, class library, etc.).

    The two above are largely lost in recent large scale systems I've inherited.  The net effect of too early optimization or 'ohh it's soo cool patterns / new class library of the week' is that the cost of the software is greatly increased, the lifespan of the software is considerably shortened and the failure rate (software fails to meet ongoing business needs) is high.

    Systems that solve business problems and don't pick overly complicated architectures or lots of third party libraries last longer and are cheaper in the long run.

  • I often complain about a similar problem with the way people usually describe reorderings in memory models.  Everyone seems to start with a discussion of cache coherence, write buffers, memory heirarchies, etc., when what is really important is just that things can be reordered.  The student is often left with the impression that this stuff is a lot harder to understand than it really is.  (I'm not saying memory models are easy - just that you really don't need an EE degree to understand how to use them).  Describing things this way also makes people think it's just a hardware issue, when in fact the most surprising reorderings are done by software (compilers, runtimes, etc.).  The only time a programmer should be thinking about cache lines is when something isn't going fast enough.  And even then they should try thinking about everything else first. :)

  • "Making the nano-optimization of making a type that really should be a ref type into a value type for a few nanoseconds of perf gain is probably not worth it. I would only be making that choice if profiling data showed that there was a large, real-world-customer-impacting performance problem directly mitigated by using value types. Absent such data, I’d always make the choice of value type vs reference type based on whether the type is semantically representing a value or semantically a reference to something."

    I find that the problem is more commonly doing the other way around: I'd like to make all my no-inherent-identity, value semantic types value types, but it seems that it places an undue performance burden for large aggregates for no good reason. Hence the MSDN (and, if I recall correctly, Framework Design Guideline) recommendations to avoid structs over 4 ints large, and make them into classes past that point instead. Which is a pity, because I really shouldn't have to be concerned about such things at all - as you yourself say, I should care about semantics, not about implementation details (and how they affect performance). Unfortunately, I'm forced to.

    It would be really nice if CLR would indeed "do what's best" in any given case, and allocate some value types on the heap, cloning as needed when passing around copies (and possibly doing some mutability analysis to avoid copying at all!). As you say, it has a mandate to do just that already, so I'm surprised it's not used yet.

    On a side note, sticking to this definition of value types - don't you think it would be much less confusing to have the value-semantic types in FCL also be structs? I mean things like System.String and System.Uri. In fact, I've never understood the design decision of making String a reference type - it looks like one of those unfortunate things done following the Java path, like array covariance. IMO, the fact that C# overloads == and != for String is already a dead giveaway that it shouldn't be a reference type in the first place (and the same goes for any other type, by the way). And, of course, the pure evilness that is "null"...

    Structs in the CLI have the pleasant-for-the-implementor property that they are always of a size known at compile time. Strings and URIs do not have that property. So we can either (1) abandon the nice property that structs are of known size, or (2) make strings an immutable reference type and overload equality so that they act like value types. We chose (2). Design is the art of making reasonable compromises when faced with conflicting goals. -- Eric
  • The proper response, in my mind, to the 'explanation' of value types as "Value types live on the stack where reference types live on the heap" should be "And?"

    More explicitly, it should be "Why the @#$% should I care where they live? I asked what is the difference between them, I couldn't care less about where they live OR what they eat for dinner."

    When learning about struct vs class, I didn't know what heaps and stacks were used for yet. I did know that function locals live on the stack, but not to a level that reading 'live on the stack' made me thought "Oh, like function locals". So I simply ignored any reference (no pun intended) to the heap or stack. I believe that is a major reason in why I now know what the difference actually is - or at least why it came so naturally when I first learned it. Because I ignored the 'stack' and 'heap' references that were thrown in there like little confusion bombs. Luckily for me, I didn't understand them!

    @Jon: You're not alone, don't worry.

  • Even odder about "light and fast" idea is that up until recently (and even now, I think), the JIT doesn't do all the optimization it could with value types. Before 3.5SP1, I think tail calls wouldnt be performed if the type was a value type. In running mini benchmarks here and there, sometimes passing around a struct results in far worse performance.

  • Since I work in C++ mainly, I can only comment from the C++ viewpoint.

    I try to use as much value semantics, with objects allocated on the stack, as possible. Especially when combined with templates.

    Littering your code with 'new' statements when it's not necessary and trivial to avoid, is basically a time sink.

    Stack allocation is an order of magnitude faster, and it does not need to lock anything.

    So advising people to not pay attention to this, will create a codebase that is slower than it should be.

  • But...C# is intended to run on CLI, right? Even though ECMA-334 states that C# doesn't have to rely on CLI, but any implementation of C# should still support minimal CLI features required by C# standard. And ECMA-335 CLI does talk about "the invocation stack" and "the heap", even though this stack doesn't have to be related to the C-stack. It's pretty vague to just say "the stack" without pointing out "which stack", for implementations of CLI can choose to put invocation stack frames on the heap, or whatsoever.

  • I agree that that the storage location of a type is not  the most important property to describe when explaining the diffrence between value and refrence types. Neither is light and fast vs big and slow.

    But as an alterantive could you give a good alterantive explanation based on the observable characteristics' in such a way that developers will get a good understanding of the diffrence and be able to make the right choice.

  • I always follow the rule of "Measure Measure Measure" and tend not to create premature optimization. This is the reason why I often use classes without worrying about "mmmmm maybe a structure is quickier". I programmed a lot in C++ and Assembly, and I found myself familiar with stack or heap concept, moreover I like the C++ approach where is duty of the programmer to decide if an object is to be allocate on the stack or into the heap.

    But .NEt is a Garbage Collected environment, this prevent majority of memory leak, and this is great. Moreover allocating on the heap is not so slow, because the heap gets compacted, thus allocating X bytes in a managed heap is only a matter of incrementing a counter.

    The drawback of using a lot of classes, is that you have high memory pressure on the heap, you will end with more garbage collection, thus a slower program. But as MichaelGG says, passing around a lot of structures will end in a lot of memcopy, while passing a class consist of only pass the class pointer.

    Moreover I found web application where the home page does 50 queries to the database........ in such a situation is ridiculous to worry about diff of performances between class and struct.

    The last consideration is: A great number of .net developers does not fully understand how a struct behave, and I saw bug deriving by forgetting that a type is a value and not a struct. My golden rule is: write only classes,  and if your memory profiler of choice tells you that you have a slow heap because you have thousans object of class X, you can begin to think if it worth changing it to a structure.

    alk.

  • Interesting Finds: April 28, 2009

Page 1 of 3 (36 items) 123