CSS SQL Server Engineers

This is the official team Web Log for Microsoft Customer Service and Support (CSS) SQL Support. Posts are provided by the CSS SQL Escalation Services

How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

Rate This
  • Comments 4

Applies to: SQL 2005, 2008, 2008 R2, and SQL 2012 versions.

The NODE an operating system page, physically belongs to can be acquired using the QueryVirtualMemoryEx Windows API.  SQL Server uses this API to track locality of memory allocations.

This blog is a very high level view of SQL Server behavior but I think it provides a sufficient picture as to what is happening.

This tracking is important to performance because SQL Server makes reasonable attempts to use node, local memory whenever possible.   Access to memory on a remote node (remote memory/foreign memory) takes longer which can lead to, small, unwanted delays in query processing.

When a block of memory is allocated SQL Server looks at each operating system page and sorts it according the physical memory node assignments on the system.   There are a couple of states that the memory manager uses and in doing so the behavior varies.

Type Description
Local The memory is physically present on the same NUMA node as the SQL Server worker.
Away The memory is known to belong current node but allocated from a remote node and the memory manager is still in the GROWTH (often initial ramp-up phase.)   The memory is held on the away list and used as needed by SQL Server requires.
Taken Away The memory the workers on the local node allocated and placed on appropriate away buffers of remote nodes.
Foreign The memory is known to belong to a remote node and SQL Server has transitioned from the GROWTH phase because max server memory has been achieved.  The memory block is being used remotely.

There is a subtle distinction between Away and Foreign blocks related to ‘is SQL Server memory still growing’ or has ‘SQL Server memory reached the target memory level.’

The difference directly drives how and when SQL Server consumes the memory.   When SQL Server has not reached the max server memory the away buffers can be set aside on their appropriate nodes.  Let me try to explain better.   

Allocate Memory 

If(Memory is remote to the node) 
    Place on away list
    Loop back and try to allocate another block
else
   Use the local memory block

The reason to place blocks on an away list and not use them right away helps prevent a bad cycle with the operating system or lock into remote memory in the local node.   If SQL Server released the block it may go onto the operating system free list.   The very next call to allocate memory could return the same block to SQL Server, preventing forward process towards local memory allocation.

allocations on the local node can acquire away buffers from the nodes list and once SQL Server reaches the target memory level for the instance the memory manager transitions away buffers as well.  This transition signals that away buffers can be used by their assigned, physical nodes.   The memory continues to be balanced across all memory nodes as appropriate.   Only at this point are any pages, that can’t be directly returned to their local nodes, considered Foreign.

The foreign is memory known to belong to another node and being activity used outside its home node.

The following is a snippet of data from dbcc memorystatus showing the behavior.

Type Description
Away Committed The amount of memory allocated from ‘this’ physical memory node that is currently assigned to a remote memory node.
Taken Away Committed The amount of memory that ‘this’ node has set aside because it knows the memory block belongs to another node.

Knowing the Away Committed and Taken Away Committed values you can look across all memory nodes and understand the current remote vs local memory allocation pattern.

image

Once the memory manager transitions from GROWTH and the distribution of away blocks occur the output changes to show any foreign memory usage.   In this example only 32kb of memory remains foreign to node 1 after all balancing and memory block assignment is complete.

image

My reaction now is ‘BIG DEAL WHY DO I CARE?’

I pointed this out for a couple of reasons. 

First, the away buffer counts are not part of the performance counters, only foreign memory counters.   To view the away buffer counts you need to capture dbcc memorystatus details.  You could also look at sys.dm_os_memory_nodes or other locations to calculate the difference in free list size, committed and target on the node to determine possible foreign or away block sizes and counts.

Second, during the GROWTH phase, the free and target values for a node with a large ‘Taken Away Committed’ count will be much larger than 1/nth of your memory nodes.  A great indicator that this node has a larger than expected, remote memory offering.

Third, a single query with larger than expected RAM usage may appear.  In my lab testing (256GB RAM system) a query that only needed 30GB of RAM would cause my SQL Server to use ~100GB of total memory.  ~70GB of the memory was on my away lists and not actively used by SQL Server to support the query.   This is expected because I told SQL Server it could use 225GB of memory (max server memory.)  Since SQL Server has not reached that limit putting buffers on the away list is not a problem.   As soon as SQL Server transitions from the GROWTH phase these buffers are readily available, mostly on their home, physical node.

Bob Dorr - Principal SQL Server Escalation Engineer

Leave a Comment
  • Please add 4 and 3 and type the answer here:
  • Post
  • Extremely informative, thanks!

    Three questions:

    1. The away buffers seem to be retained even when not needed to prevent the OS from grabbing them under memory pressure.  At the same time, it seems that if min server memory = max server memory, and max server memory is achievable there might not be a need to maintain away buffers after local buffers have been allocated for a given query.  Is that the case, or are away buffers potentially reserved even if not used regardless of min server memory = max server memory?

    2.  If min server memory is not equal to max server memory, it seems that under memory pressure SQL Server may yield memory back to the OS and shrink the buffer pool size.  Would that mean that "inactive" away list reservations could grow again?  Large inactive away lists under memory pressure, when paging may be occurring, could be a second performance penalty to the memory shortage.  

    3.  In your large query memory grant example at the end, ~70gb of memory on the away lists is inactive.  "Since SQL Server has not reached that limit putting buffers on the away list is not a problem."  Even though the 70gb of inactive memory is not necessarily inconveniencing that individual query it seems that the large amount of inactive reserved memory could result in other query memory grants being pended.  Is that correct, and if so, is the use of trace flag 8015 to ignore NUMA and use a single buffer pool a reasonable workaround to evaluate?

  • Do you know how this plays with the Lock pages in memory option and corresponding trace flags? Would be interesting to hear.

  • Locked Pages does not have any impact on the away and foreign sorting behavior.   Same behavior just locked.

    Away is maintained until you reach the max server memory target.  You don't move out of the away state until you reach the target.   If you are on your way to the max target and overall physical memory pressure occurs they can be released back to the OS.  AWAY can't be used by a query (they are set aside) so they won't be associated with a query.

    Once we hit the max server memory the first time RAMP-UP phase is done so we don't have to mess with away and foreign, even during grow and shrink due to memory pressure (on most occasions.)

    The trace flag is the exact workaround we use with customers.

    We also uncovered a scenario where a VM is running on-top of a NUMA system by the VM reports NUMA=TRUE and Node Count = 1.  However, you can configure the VM to use more memory than a single nodes, local memory and you get the same behavior.   It is the same behavior you would see if you affinitized SQL to a specific set of nodes and left max server memory > the memory associated with those nodes.   Again, use of the trace flag is the suggested workaround.

  • i want to seek the work on sql and earn.

Page 1 of 1 (4 items)