One of the big pushes for SSRS 2008 has been to reduce the occurrence of OutOfMemoryExceptions caused during report execution.  A lot of work has gone into making this happen throughout the report rendering stack, including significant changes in the report processing engine to move a definition based object model as opposed to an instance based one.  I can't take the credit for this massive undertaking, but I did work on some of the supporting infrastructure to ensure that components of the report processing and rendering engines can make more efficient usage of the memory on the box while reducing the likelihood of OOM.  This is probably going to be a multi-part discussion, with the first going over some of the internal infrastructure, and the next drilling more into specifics with examples of the memory monitoring component in action.

One of the upfront design decisions that we made was the processing and rendering engines would continue to be managed code.  This decision essentially means that they continue to allocate memory from a shared pool -- the CLR managed heap.  Ultimately, it is the managed heap which makes policy decisions on whether or not specific allocations are going to succeed or fail.  From the perspective of our native hosting layer, SSRS 2008 only sees the managed heap as a single large pool of memory and there is no visibility into specific components, requests, or even appdomains.  Given the current hosting interfaces for the CLR, this just is not a tractable problem at this point. 

Given the design decision to continue to use managed code and thus a shared pool of memory, we opted to go for a two-pronged approach:

  • Implement/leverage a process-wide memory notification infrastructure to detect when we are approaching/experiencing memory pressure.
  • In response to this global notification, determine the specific memory consumers which should be trimmed.

The first was actually already done for us.  SSRS 2008 builds its native hosting layer on top of shared infrastructure from the SQL Server engine which already has memory monitoring components in the form of Memory Broker and other resource monitoring technologies.  These provided us with a configurable and predictive memory management infrastructure.  This is the key first piece.

The second piece of the puzzle is attributing memory usage to specific consumers.  You will note that I didn't say "requests" here.  Internally, there is a managed infrastructure which specific components can leverage to report their memory usage and receive notifications as to how much memory they should try to free.  This infrastructure is heavily utilized by our report processing and rendering engine to manage the size of data structures which grow as the amount of data in the report grows.  This allows RS to strike a reasonable balance between memory utilization and stability. 

Here is a rundown of the operations that occur when our native hosting layer detects memory pressure:

  1. Native layer informs managed code that there is pressure and specifies the total amount of memory that must be shrunk.
  2. Managed layer enumerates all memory consumers (generally 2-3 per report rendering request plus some global objects such as caches). 
  3. We determine the minimal subset of memory consumers that have to be trimmed in order to satisfy the shrink request.
    1. Actually -- there is a a bit of special sauce here.  Each component also gives the centralized memory manager information about "how difficult" it is to shrink its memory usage and this is factored into determining whom to shrink.

Step (3) is pretty actually interesting and is a key design decision for SSRS.  An alternative approach would be to perform more of a "fair share" approach to trimming memory usage.  The decision to minimize the number of components which are shrunk is really driven by the fact that a memory shrink operation is somewhat heavyweight.  The vast majority of components which subscribe to these events store everything in memory up until the point they receive the notification.  In general, these consumers are things like lists and dictionaries which are required to keep track of things that are put into them -- they can't just evict entries like a cache may be able to.  So in order to begin to use less memory, these data structures have to fall back to a mixed disk/memory mode which requires serializing a portion of their state to disk.  In order to keep things running as fast as possible when we are not under memory pressure, all of this serialization happens when they receive the shrink notification.  So you can imagine what would happen if we went with a fair share approach where every request is asked to trim some memory.  Each request currently being processed would grind to a halt as they race to serialize their state to disk.  Instead, with our approach what happens is a small fraction of requests (usually the big memory hogs) are temporarily suspended while they serialize some state to disk, and other requests are to a large degree unaffected.