Suresh brought to my attention that we have been getting questions as to why SQL Server starts on group 1 and then group 2 and it is not predictable?  Then Tejas brought up another issues and since I worked on this way back before we released SQL 2008 R2 I went back to my notes to pull up some details that I thought might be helpful.

The answer is that you need a SQL Server that is Group Aware to use more than one group worth of CPUs in Windows.  However, if you have an older version or a SKU that does not support enough CPUs to span groups the default is for Windows to start the service on any group.  

SQL Server 2008 R2 uses the group aware so it will use new APIs and establish proper use of the entire system. 

clip_image001

Legacy XPROC / COM Object / Linked Server using a CPU based scheme.  

Some designs of legacy components may not be group aware or safe.  In the following example, if originally loaded on Group 2 the initialization would see 60 CPUs and create 60 partitions for a local memory manager which might work perfectly with proper synchronization.  However, if the original initialization of the partitions occurs on Group 1 it will only create 40 partitions and access from CPUs 41 ... 60 on Group 2 may fail as they don't exist.   (SQL Server does not have any of these components.)


Great Legacy Reference: http://msdn.microsoft.com/en-us/windows/hardware/gg463349.aspx 

Windows allows you to configure node to group assignments

By configuring Windows to see the same number of CPUs and Nodes within each processor group the SQL Server, SQL Agent, … services can start on any processor group and they will see the same amount of resources.


Note: There is a hotfix for Windows 2008 R2 that automatically balances the processor assignments are corrects the issue for most systems: http://support.microsoft.com/kb/2510206

 

The following knowledge base article outlines how to assign nodes to the processor groups so you have control over the group assignments. How to manually configure K-Group assignment on multiprocessor machines: http://support.microsoft.com/kb/2506384

You can use Windows Task Manager to see and set the affinity for the process

Start with the Windows Task Manager | Process Tab.

Select a process, Right Mouse | Set Affinity -- the following dialog is presented showing you the Processor Groups (K-Group), Nodes and CPUs on the machine.

TaskManProcessAffinity

 

The Problem

 

Even after all this work there is an issue using older versions of SQL Server (SQL 2008 pre-SP3 and SQL 2008 R2 pre-SP2) which requires SQL Server to startup in the group that holds physical NUMA Node = 0 and at least 1 CPU assigned from physical node 0 to the instance, or lazywriter activity is not started property.   This can lead to buffer pool sizing, performance counters not getting updated and other stall and hang like behaviors (OOM, Latch Timeouts, …) of the SQL Server process.

 

As a general rule is best to start them in group 0 (usually contains physical NUMA node = 0 but you must confirm this.)

 

You can enforce this on the system by setting the ImageFileExecutionOptions - NodeOptions value and by making sure the SQL Server affinity mask contains at least 1 CPU in physical NUMA node 0.   However, this forces all instances of SQL Server to start on the same node and all instances have to share physical NUMA node = 0.

 

Excerpt from the Windows 2008 R2 Release Notes

"Allocation of child processes among the ideal nodes of Non-Uniform Memory Access (NUMA) nodes is not efficient, which results in performance degradation, increased latency, and cache misses, depending on the affected processes.

To correct this, edit the registry to change the inheritance of ideal NUMA nodes so that generated child processes are assigned the same IdealNode as their parent process. This setting is not system wide, but per-process.

To change the NUMA inheritance

  1. Open Registry Editor (Regedit.exe).

  2. Add the following registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<process_name>.exe

    where process_name is the name of the process you want to change the inheritance behavior for.

    Assign a DWORD value to this key with the name NodeOptions and data 0x00001000.

    Close Registry Editor and restart the computer."

Windows does not currently provide a per service, instance option for starting the service in a specific group.   To avoid starting multiple instances of SQL Server on the same processor group use SQL Server 2008 R2 or newer version and establish the proper processor affinity.

 

SQL Server 2005
SQL Server 2008
Requires SQL Server Affinity Mask to use a CPU in physical NUMA Node = 0 as presented by the operating system. Use NodeOptions to start SQL Server instance in proper group that presents CPUs from physical NUMA Node = 0

- or -

Apply Windows SMC.exe QFE to assign startup group to a specific group containing physical NUMA node = 0
SQL Server 2008 SP3 No longer requires use of CPU in physical NUMA Node = 0 Requires Windows SMC.exe QFE to assign startup to a specific group or you can allow random group startup behavior.
SQL Server 2008 R2 No longer requires CPU in physical NUMA Node = 0 Is group aware so it does not need the NodeOptions or SCM.exe QFE.

 


Other Helpful References

Bob Dorr - Principal SQL Server Escalation Engineer