If after upgrading from SQL 2000 to SQL 2005 RTM and/or SP1, you notice decreased OLTP database application throughput, bloated plan cache or out of memory errors, you will need to gather machine configuration information and system performance data in order to perform a detailed analysis. Below, we will outline the steps to take to gather relevant data in order to determine if the problem is plan cache related.
1.0 Machine Configuration Information That Can Impact Plan Cache Size/Performance
Machine configuration and server settings can impact plan cache performance. This section covers a check list of these settings/options that need to be examined as a first step to diagnosing a performance regression.
1.1 Understanding the Machine Configuration:
(i) Processor Architecture: Recommendations for server settings for optimal plan cache behavior varies with the platform (x86/IA64/x64). The machine platform can be found executing 'set PROCESSOR_ARCHITECTURE' at the command prompt. Also check the platform of SQL Server to see if we are running under WOW. Note that on 64 bit this depends on whether you are running %systemroot%\system32\cmd.exe or %systemroot%\SysWOW64\cmd.exe. It will show x86 for the WOW version of the command console.
(ii) Total number of processors: The DMV sys.dm_os_sys_info has this information:
select cpu_count from sys.dm_os_sys_info
go
(iii) NUMA Architecture and Number of Nodes: The number of memory nodes on a machine can be found using:
select distinct top 1 instance_name + 1 as number_of_nodes
from sys.dm_os_performance_counters
where object_name = 'SQLServer:Buffer Node'
Also check the errorlog for the entries as below for the number of CPU nodes:
TimeStamp Server Multinode configuration: node 0: CPU mask: 0x00000000000000f0 Active CPU mask: 0x00000000000000f0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
This should give the number of CPU nodes as well as the CPU Mask set.
(iv) If NUMA Architecture, Number of processors per Node: If Soft NUMA configurations have not been used then the following query gives a mapping of CPU to nodes:
select parent_node_id, scheduler_id from sys.dm_os_schedulers
where scheduler_id < 255
order by scheduler_id desc
If soft NUMA has been used, the number of CPU nodes is greater than the number of hardware NUMA nodes. The above query then maps the CPUs to CPU nodes, not hardware NUMA nodes. To check if the machine has been configured with soft NUMA search the registry for keys under:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\NodeConfiguration\. If there are keys present here they define the soft NUMA mapping of processors to soft NUMA nodes. For more information see BOL article ‘How to: Configure SQL Server to Use Soft-NUMA’.
(i) Total Physical Memory:
select physical_memory_in_bytes/(1024.0*1024.0*1024.0) as physical_mem_in_gb
from sys.dm_os_sys_info
(ii) If NUMA Architecture, Memory allocated per Node (in 8K pages):
select cntr_value
where object_name = 'SQLServer:Buffer Node' and counter_name = 'Total pages'
The plan cache is logically on node 0 on a multi node machine. This however does not imply that all the memory for the plan cache comes from node 0 only. To better explain this, consider the following example: A multi-node machine has 2 NUMA nodes, the buffer pool has 2 memory nodes, nodes 0 and 1 respectively. Schedulers s1, s2, s3 are associated with NUMA node 0 and s4, s5, s6 with NUMA node 1. The plan cache/memory clerk is on memory node 0. When scheduler s4 requests 2 pages for a new PMO (plan), the buffer pool tries to allocate the pages locally through the NUMA node these schedulers are associated with i.e. through node 1. Even though these pages were allocated from NUMA node 1, the cachestore memory clerk which is logically on node 0, will show a total of 2 pages. Therefore allocating more memory to NUMA node 0 just because the plan cache is on node 0 will not help performance. It may actually land up negatively impacting performance due to large number of remote memory accesses if queries are affinitized uniformly among nodes. If the memory allocated per node is not uniform in general it is better to affinitize the workload to nodes that have sufficient memory. Ideally queries should be affinitized to nodes through specific ports (if possible). If SQL Server has been configured to run on a subset of the available NUMA nodes, the buffer pool will not automatically be limited to the memory on those nodes. In this case, use the max server memory option to limit the buffer pool.
(iii) Lock Pages In Memory Privilege: This is an operating system privilege that allows locking of physical memory preventing OS paging of the locked memory. This privilege needs to be granted to the service account of sql server. On 32 bit architecture systems with sql server standard, enterprise and developer editions, this privilege needs to be granted to the sql server service account in order to use the AWE mechanism. On 64 bit architecture systems with any sql server edition, granting this privilege to the sql server service account is recommended. Note that a system restart is required after granting this privilege local system accounts for the permission to take effect. For domain users a fresh login is required. A quick and easy way to check (though non conclusive) if this privilege has been granted correctly on 32 bit would be to try to turn on AWE enabled as follows:
sp_configure 'show advanced options', 1
reconfigure with override
sp_configure 'awe enabled', 1
If lock pages in memory privilege has not been granted to the service account, sp_configure will fail with the following error message:
Msg 5845, Level 16, State 1, Line 1
Address Windowing Extensions (AWE) requires the 'lock pages in memory' privilege which is not currently present in the access token of the process.
On 64 bit machines, execute 'dbcc memorystatus'. If the AWE Allocated memory is 0 then lock pages in memory privilege has not been granted to the service account or it has not taken effect. On 32 bit machines, if AWE Allocated is 0 in the data from dbcc memory status and the sp_configure option 'awe enabled' is set to 1, then lock pages in memory privilege has not been granted to the service account or it has not taken effect.
For more information on how to grant this privilege the sql server service account please refer to the MSDN article ‘How to: Enable the Lock Pages in Memory Option (Windows)’. Restarting the machine typically helps permissions granted to the sql server service account take effect.
(iv) /3GB Boot Parameter: This is an operating system boot parameter that enlarges the user-mode virtual address space to 3 GB on a 32 bit system with less than 16GB total physical memory. On a 32 bit system with greater than 16GB total physical memory adding this parameter limits the physical memory to 16GB. A 32 bit process running under WOW64 will have 4GB of user mode virtual address space.
(v) /UserVA Boot Parameter: Another switch with similar function as the /3GB switch is /UserVA=X where X is number in MB.
The following query will tell you if either the /3GB or the /UserVA boot parameter has been enabled:
select
case
when virtual_memory_in_bytes / 1024 / (2048*1024)
< 1 then 'No switch'
else '/3GB'
end
1.2 SQL Server Configuration Options
sp_configure 'max worker threads'
Each worker has a stack associated with it. The stack size can be found:
select stack_size_in_bytes/(1024.0 * 1024.0) as stack_size_in_MB from sys.dm_os_sys_info
The memory allocated for worker stacks needs to be factored out of the total physical memory before considering what the sql server max server memory is set. A large number of worker threads can actually reduce the memory available for plan cache growth.
sp_configure 'awe enabled'
On 32 bit systems it is recommended to set AWE Enabled to 1. This allows sql server to go beyond the process virtual address space limit on 32 bit platforms. It will also allow buffer pool to shrink virtual address space (VAS) usage under VAS pressure that may occur due to lots of multi page allocations or CLR. On 64 bit systems, this setting has no effect.
It is recommended that the CPU affinity mask be set such that all processors are used uniformly. Note though that by setting this hard affinity, threads cannot move around. In the case of a NUMA machine, if one node has less memory than other nodes, then it is better to affinitize to other nodes (CPU's). The affinity mask can be found using the sp_configure option:
sp_configure 'affinity mask'
sqlservr [-g memory_to_reserve]
Using this start up parameter, an integer number of megabytes (MB) of memory can be specified that SQL Server will leave available for memory allocations within the SQL Server process, but outside the SQL Server buffer pool. The memory outside of the memory pool is the area used by SQL Server for multipage allocations among other things like loading DLLs. The default is 256 MB on 32 bit systems and is reserved even if this start up option is not specified. Using a value lower than the default increases the amount of memory available to the buffer pool and thread stacks; this may, in turn, provide some performance benefit to memory-intensive workloads in systems that do not use many extended stored procedures, distributed queries, or automation objects or any SQLCLR. For applications that use SQLCLR heavily, limiting this memory may result in memory starvation. For such applications it is the memory_to_reserve is set beyond the default of 256 MB if a lot of memory is allocated in CLR. Specifying a higher than default value may prove beneficial to workloads that have many multipage allocations. Setting this start up parameter appropriately and tuning it based on the nature of the workload running on the server is recommended for optimal results.
For more information refer to the books online article at ‘Using the SQL Server Service Startup Options’
sp_configure 'clr enabled'
sp_configure 'max degree of parallelism'
1.3 Database Options
select name, is_parameterization_forced from sys.databases
1.4 Workload Characteristics
(i) Query Mix: The type of the workload whether purely adhoc or stored procedures or a mix of adhoc and stored procedures can analyzed using:
select usecounts, cacheobjtype, objtype, bucketid
from sys.dm_exec_cached_plans cp
where cacheobjtype = 'Compiled Plan'
order by objtype
The objtype of 'adhoc' or 'proc' indicates the number of adhoc queries and stored procedure executions. Note that this query gives a snapshot of the plan cache at the time the query is executed. Prior to execution of this query, entries may have been removed from plan cache due to memory pressure.
(ii) Parameterization of Queries: From the above query if rows with objtype 'prepared' indicate that the queries are getting parameterized. If there are a large number of queries differing only in parameter values, the parameterized plan will have a high value for the usecounts column. If there are no parameterized queries, but many rows with objtype 'adhoc' then it indicates that queries are not getting parameterized.
(iii) Degree of Reuse of Plans: The usecounts column in the above query will indicate the degree of reuse of plans. If there are adhoc queries (shell queries) with a large usecounts this indicates that the exact same adhoc query is submitted several times for execution. If on the other hand the usecounts is close to 1 for adhoc queries this indicates a workload that has adhoc queries with almost no reuse. If the usecounts of the parameterized queries is low this may indicate incorrect client side parameterization. Several parameterized queries same query, but difference in data type of variables may indicate buckets of parameterized queries.
It is useful to gather all of the above information on SQL Server 2005 and SQL Server 2000 if you are seeing a performance regression upon upgrading to SQL Server 2005 (Note though that the scripts for SQL Server 2000 are different).