• Ntdebugging Blog

    Too Much Cache?

    • 28 Comments

    Cache is used to reduce the performance impact when accessing data that resides on slower storage media.  Without it your PC would crawl along and become nearly unusable.  If data or code pages for a file reside on the hard disk, it can take the system 10 milliseconds to access the page.  If that same page resides in physical RAM, it can take the system 10 nanoseconds to access the page.  Access to physical RAM is about 1 million times faster than to a hard drive.  It would be great if we could load up all the contents of the hard drive into RAM, but that scenario is cost prohibitive and dangerous.  Hard disk space is far less costly and is non-volatile (the data is persistent even when disconnected from a power source). 

     

    Since we are limited with how much RAM we can stick in a box, we have to make the most of it.  We have to share this crucial physical resource with all running processes, the kernel and the file system cache.  You can read more about how this works here:

    http://blogs.msdn.com/ntdebugging/archive/2007/10/10/the-memory-shell-game.aspx

     

    The file system cache resides in kernel address space.  It is used to buffer access to the much slower hard drive.  The file system cache will map and unmap sections of files based on access patterns, application requests and I/O demand.  The file system cache operates like a process working set.  You can monitor the size of your file system cache's working set using the Memory\System Cache Resident Bytes performance monitor counter.  This value will only show you the system cache's current working set.  Once a page is removed from the cache's working set it is placed on the standby list.  You should consider the standby pages from the cache manager as a part of your file cache.  You can also consider these standby pages to be available pages.  This is what the pre-Vista Task Manager does.  Most of what you see as available pages is probably standby pages for the system cache.  Once again, you can read more about this in "The Memory Shell Game" post.

     

    Too Much Cache is a Bad Thing

    The memory manager works on a demand based algorithm.  Physical pages are given to where the current demand is.  If the demand isn't satisfied, the memory manager will start pulling pages from other areas, scrub them and send them to help meet the growing demand.  Just like any process, the system file cache can consume physical memory if there is sufficient demand. 

    Having a lot of cache is generally not a bad thing, but if it is at the expense of other processes it can be detrimental to system performance.  There are two different ways this can occur - read and write I/O.

     

    Excessive Cached Write I/O

    Applications and services can dump lots of write I/O to files through the system file cache.  The system cache's working set will grow as it buffers this write I/O.  System threads will start flushing these dirty pages to disk.  Typically the disk can't keep up with the I/O speed of an application, so the writes get buffered into the system cache.  At a certain point the cache manager will reach a dirty page threshold and start to throttle I/O into the cache manager.  It does this to prevent applications from overtaking physical RAM with write I/O.  There are however, some isolated scenarios where this throttle doesn't work as well as we would expect.  This could be due to bad applications or drivers or not having enough memory.  Fortunately, we can tune the amount of dirty pages allowed before the system starts throttling cached write I/O.  This is handled by the SystemCacheDirtyPageThreshold registry value as described in Knowledge Base article 920739: http://support.microsoft.com/default.aspx?scid=kb;EN-US;920739

     

    Excessive Cached Read I/O

    While the SystemCacheDirtyPageThreshold registry value can tune the number of write/dirty pages in physical memory, it does not affect the number of read pages in the system cache.  If an application or driver opens many files and actively reads from them continuously through the cache manager, then the memory manger will move more physical pages to the cache manager.  If this demand continues to grow, the cache manager can grow to consume physical memory and other process (with less memory demand) will get paged out to disk.  This read I/O demand may be legitimate or may be due to poor application scalability.  The memory manager doesn't know if the demand is due to bad behavior or not, so pages are moved simply because there is demand for it.  On a 32 bit system, the file system cache working set is essentially limited to 1 GB.  This is the maximum size that we blocked off in the kernel for the system cache working set.  Since most systems have more than 1 GB of physical RAM today, having the system cache working set consume physical RAM with read I/O is less likely. 

    This scenario; however, is more prevalent on 64 bit systems.  With the increase in pointer length, the kernel's address space is greatly expanded.  The system cache's working set limit can and typically does exceed how much memory is installed in the system.  It is much easier for applications and drivers to load up the system cache with read I/O.  If the demand is sustained, the system cache's working set can grow to consume physical memory.  This will push out other process and kernel resources out to the page file and can be very detrimental to system performance.

    Fortunately we can also tune the server for this scenario.  We have added two APIs to query and set the system file cache size - GetSystemFileCacheSize() and SetSystemFileCacheSize().  We chose to implement this tuning option via API calls to allow setting the cache working set size dynamically.  I’ve uploaded the source code and compiled binaries for a sample application that calls these APIs.  The source code can be compiled using the Windows DDK, or you can use the included binaries.  The 32 bit version is limited to setting the cache working set to a maximum of 4 GB.  The 64 bit version does not have this limitation.  The sample code and included binaries are completely unsupported.  It is just a quick and dirty implementation with little error handling.

Page 1 of 1 (1 items)