Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
I started programming on x86 machines during a period of large and rapid change in the memory management strategies enabled by the Intel processors. The pain of having to know the difference between “extended memory” and “expanded memory” has faded with time, fortunately, along with my memory of the exact difference.
As a result of that early experience, I am occasionally surprised by the fact that many professional programmers seem to have ideas about memory management that haven’t been true since before the “80286 protected mode” days.
For example, I occasionally get the question “I got an ‘out of memory’ error but I checked and the machine has plenty of RAM, what’s up with that?”
Imagine, thinking that the amount of memory you have in your machine is relevant when you run out of it! How charming! :-)
The problem, I think, with most approaches to describing modern virtual memory management is that they start with assuming the DOS world – that “memory” equals RAM, aka “physical memory”, and that “virtual memory” is just a clever trick to make the physical memory seem bigger. Though historically that is how virtual memory evolved on Windows, and is a reasonable approach, that’s not how I personally conceptualize virtual memory management.
So, a quick sketch of my somewhat backwards conceptualization of virtual memory. But first a caveat. The modern Windows memory management system is far more complex and interesting than this brief sketch, which is intended to give the flavour of virtual memory management systems in general and some mental tools for thinking clearly about what the relationship between storage and addressing is. It is not by any means a tutorial on the real memory manager. (For more details on how it actually works, try this MSDN article.)
I’m going to start by assuming that you understand two concepts that need no additional explanation: the operating system manages processes, and the operating system manages files on disk.
Each process can have as much data storage as it wants. It asks the operating system to create for it a certain amount of data storage, and the operating system does so.
Now, already I am sure that myths and preconceptions are starting to crowd in. Surely the process cannot ask for “as much as it wants”. Surely the 32 bit process can only ask for 2 GB, tops. Or surely the 32 bit process can only ask for as much data storage as there is RAM. Neither of those assumptions are true. The amount of data storage reserved for a process is only limited by the amount of space that the operating system can get on the disk. (*)
This is the key point: the data storage that we call “process memory” is in my opinion best visualized as a massive file on disk.
So, suppose the 32 bit process requires huge amounts of storage, and it asks for storage many times. Perhaps it requires a total of 5 GB of storage. The operating system finds enough disk space for 5GB in files and tells the process that sure, the storage is available. How does the process then write to that storage? The process only has 32 bit pointers, but uniquely identifying every byte in 5GB worth of storage would require at least 33 bits.
Solving that problem is where things start to get a bit tricky.
The 5GB of storage is split up into chunks, typically 4KB each, called “pages”. The operating system gives the process a 4GB “virtual address space” – over a million pages - which can be addressed by a 32 bit pointer. The process then tells the operating system which pages from the 5GB of on-disk storage should be “mapped” into the 32 bit address space. (How? Here’s a page where Raymond Chen gives an example of how to allocate a 4GB chunk and map a portion of it.)
Once the mapping is done then the operating system knows that when the process #98 attempts to use pointer 0x12340000 in its address space, that this corresponds to, say, the byte at the beginning of page #2477, and the operating system knows where that page is stored on disk. When that pointer is read from or written to, the operating system can figure out what byte of the disk storage is referred to, and do the appropriate read or write operation.
An “out of memory” error almost never happens because there’s not enough storage available; as we’ve seen, storage is disk space, and disks are huge these days. Rather, an “out of memory” error happens because the process is unable to find a large enough section of contiguous unused pages in its virtual address space to do the requested mapping.
Half (or, in some cases, a quarter) of the 4GB address space is reserved for the operating system to store it’s process-specific data. Of the remaining “user” half of the address space, significant amounts of it are taken up by the EXE and DLL files that make up the application’s code. Even if there is enough space in total, there might not be an unmapped “hole” in the address space large enough to meet the process’s needs.
The process can deal with this situation by attempting to identify portions of the virtual address space that no longer need to be mapped, “unmap” them, and then map them to some other pages in the storage file. If the 32 bit process is designed to handle massive multi-GB data storages, obviously that’s what its got to do. Typically such programs are doing video processing or some such thing, and can safely and easily re-map big chunks of the address space to some other part of the “memory file”.
But what if it isn’t? What if the process is a much more normal, well-behaved process that just wants a few hundred million bytes of storage? If such a process is just ticking along normally, and it then tries to allocate some massive string, the operating system will almost certainly be able to provide the disk space. But how will the process map the massive string’s pages into address space?
If by chance there isn’t enough contiguous address space then the process will be unable to obtain a pointer to that data, and it is effectively useless. In that case the process issues an “out of memory” error. Which is a misnomer, these days. It really should be an “unable to find enough contiguous address space” error; there’s plenty of memory because memory equals disk space.
I haven’t yet mentioned RAM. RAM can be seen as merely a performance optimization. Accessing data in RAM, where the information is stored in electric fields that propagate at close to the speed of light is much faster than accessing data on disk, where information is stored in enormous, heavy ferrous metal molecules that move at close to the speed of my Miata. (**)
The operating system keeps track of what pages of storage from which processes are being accessed most frequently, and makes a copy of them in RAM, to get the speed increase. When a process accesses a pointer corresponding to a page that is not currently cached in RAM, the operating system does a “page fault”, goes out to the disk, and makes a copy of the page from disk to RAM, making the reasonable assumption that it’s about to be accessed again some time soon.
The operating system is also very smart about sharing read-only resources. If two processes both load the same page of code from the same DLL, then the operating system can share the RAM cache between the two processes. Since the code is presumably not going to be changed by either process, it's perfectly sensible to save the duplicate page of RAM by sharing it.
But even with clever sharing, eventually this caching system is going to run out of RAM. When that happens, the operating system makes a guess about which pages are least likely to be accessed again soon, writes them out to disk if they’ve changed, and frees up that RAM to read in something that is more likely to be accessed again soon.
When the operating system guesses incorrectly, or, more likely, when there simply is not enough RAM to store all the frequently-accessed pages in all the running processes, then the machine starts “thrashing”. The operating system spends all of its time writing and reading the expensive disk storage, the disk runs constantly, and you don’t get any work done.
This also means that "running out of RAM" seldom(***) results in an “out of memory” error. Instead of an error, it results in bad performance because the full cost of the fact that storage is actually on disk suddenly becomes relevant.
Another way of looking at this is that the total amount of virtual memory your program consumes is really not hugely relevant to its performance. What is relevant is not the total amount of virtual memory consumed, but rather, (1) how much of that memory is not shared with other processes, (2) how big the "working set" of commonly-used pages is, and (3) whether the working sets of all active processes are larger than available RAM.
By now it should be clear why “out of memory” errors usually have nothing to do with how much physical memory you have, or how even how much storage is available. It’s almost always about the address space, which on 32 bit Windows is relatively small and easily fragmented.
And of course, many of these problems effectively go away on 64 bit Windows, where the address space is billions of times larger and therefore much harder to fragment. (The problem of thrashing of course still occurs if physical memory is smaller than total working set, no matter how big the address space gets.)
This way of conceptualizing virtual memory is completely backwards from how it is usually conceived. Usually it’s conceived as storage being a chunk of physical memory, and that the contents of physical memory are swapped out to disk when physical memory gets too full. But I much prefer to think of storage as being a chunk of disk storage, and physical memory being a smart caching mechanism that makes the disk look faster. Maybe I’m crazy, but that helps me understand it better.
(*) OK, I lied. 32 bit Windows limits the total amount of process storage on disk to 16 TB, and 64 bit Windows limits it to 256 TB. But there is no reason why a single process could not allocate multiple GB of that if there’s enough disk space.
(**) Numerous electrical engineers pointed out to me that of course the individual electrons do not move fast at all; it's the field that moves so fast. I've updated the text; I hope you're all happy with it now.
(***) It is possible in some virtual memory systems to mark a page as “the performance of this page is so crucial that it must always remain in RAM”. If there are more such pages than there are pages of RAM available, then you could get an “out of memory” error from not having enough RAM. But this is a much more rare occurrence than running out of address space.
I'm sure this is a stupid question related to my not having a clear understanding of the differences between address space, virtual memory, etc, but then where does the 2GB memory allocation limit come into play?
There is no 2GB memory allocation limit. You can allocate as much memory as you want; you have 2GB of address space to map it into. If you want to have 5GB allocated, great, you do that, but you only can have up to 2GB of it mapped at any one time. -- Eric
For example, when many applications I use attempt to cross the ~2GB limit, they error or crash.
If your app author is unwilling to take on the pain of writing its own memory mapper/unmapper, then clearly it is stuck with not allocating more memory than it can map. -- Eric
There's also the Windows limit of 2GB for user-mode applications (the /3GB switch stuff). What you've said here makes sense, so should I assume that the only way to get access to more than 2GB of address space is to use alternative memory allocation techniques (LARGEADDRESSAWARE for example)?
Indeed. All the 3GB switch does is give you an extra GB of user address space by stealing it from the operating system's address space. That's the only way to get more address space in 32 bit windows. Again, you can allocate as much memory as you want, but address space is strictly limited to 2GB or 3GB in 32 bit windows. -- Eric
It seems like your article ends right before a giant "BUT this is how it usually ends up working in real world applications" caveat.
In most apps, you can only allocate as much memory as you have room to map. -- Eric
Great article, thanks! :)
I would like to see a 'back-in-the-day' type article regarding Extended & Expanded memory, just for a history lesson - I never could get the hang of it back then...
Excellent article, and I think your view of memory as large file on disk with the RAM being an optimization makes a lot more sense!
I agree that thinking of RAM as a performance optimization makes the whole concept more straightforward. The CPU cache(s), the physical memory, and the virtual memory swap file are really just different levels of a multi-level cache, each level begin larger but slower than the level before.
We don't think about the system cache as being a device that swaps out to RAM when it gets too full, so it does make a lot of sense to think about memory as being a disk file and RAM just an optimization detail. Great article.
The electrons in a circuit only travel a few miles per hour, it is their displacement wave that travels near the speed of light.
To follow up on Erik's wish (because I doubt it really has enough meat for a whole article from Eric) on Extended vs Expanded memory:
Extended Memory is what we think of as memory now, i.e. memory that is addressed further and further up from physical address 0 by using more and more of the address bus lines from the CPU. In the DOS days accessing this memory required magic, because DOS, and hence DOS apps, ran in "real mode" where the segment registers and offset registers were each 16 bits wide, and combined with a simple 4 bit shift and add to give a 20 bit physical address. Ignoring the HMA for a moment, to access beyond 2^20, i.e. 1MB, the CPU had to be switched to one of the protected modes, where the segment register treated as an index into a table of segment base addresses. This was the core of Win16's memory model.
Expanded Memory was, from the CPU's perspective, a device that happened to have RAM on it, not 'memory' at all. A portion of the reserved address space between 640K and 1024K, where other memory mapped devices like your VGA card, was used to map in pages of memory from that device. Once a page was mapped in, a real mode app could read or write it just like they'd read or write the VGA screen buffer's memory. An interrupt based API was used to change what page of expanded memory was mapped into that buffer. The closest we have to this more recently would be PAE, which like Expanded Memory is an nasty stopgap that's only used by a small fraction of apps.
For bonus points, that High Memory Area I mentioned was the first 64K past 1024K, that could be accessed from real mode. On a real 8086 (which real mode mimics) if the sum of the segment register shifted over 4 and the offset was greater than 2^20 it just wrapped, after all there were only 20 address lines. On a 286 or later you could choose whether it should wrap, or whether it should overflow into the 21's address line, giving you access to that extra little bit of memory. DOS quickly learned to tuck as much of itself as possible into that HMA, reducing the footprint it required in the bottom 640K where most apps were limited.
Okay, so it was a bit longer than I realized when I started rambling, even leaving out many details.
And about .NET development...
If I need to access many GBs of data... the CLR is intelligent enough to do it for me?
or... I must to write my own memory mapper to deal with the issue? and how?
On 64 bit CLR, you're all set. On 32 bit CLR, if you need to do custom mapping and unmapping of huge blocks of memory, then I personally would not want to handle that in managed code. There are ways to do it, of course, but it would not be pretty. -- Eric
This comment is by no means a tutorial on tutorials.
I find that posts that begin with such wording are usually the best tutorials available. This blog post is a great example of that. Previously, while I knew about these differences, I had to think to keep them in mind. Now, Eric has transformed that thought process into a state of mind. Yeah, memory is a file; it's cached in the RAM for performance. It truly makes perfect sense.
note on CLR, the CLR's memory constructs under the hood prevent any object (say an array or string) being more than 2GB in size so whilst you *could* write code to deal with greater than 2GB address space (which would be incredibly painful) you would still get an OOM exception i you tried to allocate new long[int.MaxValue] no matter what.
the nicer the box feels to be in the harder it is to jump out of it.
I like soft boxes iwht padding :)
A nitpick: considering RAM as a cache for the disk is not *quite* correct, since the manager can choose to never write a page in RAM to disk (although I believe it ensures there is enough storage in the page file just in case).
I just think of the combination of RAM and disk as the "physical memory" (or just "backing" to reduce confusion with RAM), where the manager tries to keep the often used stuff in the fast part of the memory.
@Blake Coverett - Thanks a bunch for the intro =) You're probably right about there not being enough meat for a full-blown article - but still, very interesting information nonetheless.
Simon Buchan: but that's just an optimisation applied to many caches. If the cached range never needs to touch the slow medium during its lifetime, there's not point doing so. Just as when you "allocate" a variable for a calcuation, it might get optimised away.
@Mark: Perhaps we have different definitions of "cache" then... I've always thought of it as being a copy of the authoritative value made for performance reasons. It's just the word though, this is getting super-nitpicky. (That is too a word, Safari!)
"There is no 2GB memory allocation limit. You can allocate as much memory as you want; you have 2GB of address space to map it into."
What's the use? I can't have more than 2GB allocated AT THE SAME TIME.
The use case of memory is typically that you want to remember something. Perhaps you have four billion bytes of video that you are processing, and you want to keep the whole video in memory. You could map the bits you're currently editing into virtual memory while leaving the rest unmapped until you're done with what you've got. -- Eric
Out of memory can be raised because there's not enough virtual space free. The limit is the memory you can address not the memory you can map. If I allocate three 1 GB byte arrays it will probably fail with an out of memory error because I simple do not have enough memory to address the virtual address of the last one.