Editor's Note: The following MVP Monday post is by SQL Server MVP Grant Fritchey
Because of the ways that SQL Server interacts with the Windows operating system, memory and memory management between the two can be pretty confusing. If you just look at the allocated memory within the operating system on a standard SQL Server configuration you’ll probably see all, or most, of the memory is used up by SQL Server. Within SQL Server if you look at resource counters like total server memory, these too don’t provide much information to help you determine whether or not you are experiencing pressure due to inadequate memory on your system. You need to be able to quickly and easily determine if you are experiencing memory issues. Taking advantage of the dynamic management object (DMO) sys.dm_os_ring_buffers can get you that quick and accurate determination if you have memory problems and if they’re coming from the Windows operating system or within SQL Server.
Before I show you how to query the information within sys.dm_os_ring_buffers, let’s be sure we’ve got agreement on how memory is managed between SQL Server and Windows. Thinking of this strictly from the context of SQL Server, memory, and memory problems can be broken down as being internal, that is to say a part of SQL Server and it’s memory management, or external, a part of the Windows operating system and it’s memory management. But if you reverse the view and look at things from the Windows operating system you’ll be looking at physical memory, the memory managed by Windows, or a Virtual Address Space (VAS) which is memory that has been allocated to SQL Server, but it may or may not be actual physical memory. Within the VAS, SQL Server gets to allocate memory in whatever manner it requires without having to refer back to the Windows operating system. However, the operating system is still managing memory, moving it into physical memory or to the disk through the page file.
When you first set up SQL Server, you may have set a maximum limit for memory. As SQL Server works, it will request memory from the operating system up to this maximum. It will seldom return this memory to the operating system and that’s why you so frequently see memory maxed out on the OS when SQL Server is running. It may seem like a problem, but it isn’t. However, it does add to the load placed on memory management within the operating system since less and less memory is available for things that the operating system might need. This is especially problematic if you’re running processes other than SQL Server on that system.
Because of this split in memory, you have two places where you can “run out” of memory due to different processes. You can need more memory for processes within the operating system, in which case, you’re looking at a physical, or external, memory issue. You can also require more memory within the VAS for SQL Server in which case you’re looking at a virtual, or internal, memory issue.
But the question still comes down to, how do you determine if you have memory issues and where are those issues coming from. Let’s take a look at sys.dm_os_ring_buffers to understand what it can do for us to answer this question.
This DMO is not documented in the Books Online (BOL). Because of this, it is subject to change without notice from Microsoft.
The ring buffers are simply a record of alerts fired within the system that are recorded for internal diagnostics. There are a number of different alerts that can be viewed through the DMO. Running this query against the system:
FROM sys.dm_os_ring_buffers AS dorb;
Results in this set of results (which will look different on your machine):
There are only four columns, but the final column is XML and that’s where the good information can be found. There are a number of different ring buffers but we’re only interested in the type called RING_BUFFER_RESOURCE_MONITOR. This records changes to memory allocations, which is a great way to see when you’re running out of memory because a message that your memory is low is a pretty good indicator that you might be seeing memory pressure.
If we look just at this sample XML for a RING_BUFFER_RESOURCE_MONITOR message it looks like this:
<Record id = "1" type ="RING_BUFFER_RESOURCE_MONITOR" time ="129031647">
<Effect type="APPLY_LOWPM" state="EFFECT_OFF" reversed="0">0</Effect>
<Effect type="APPLY_HIGHPM" state="EFFECT_IGNORE" reversed="0">128991020</Effect>
<Effect type="REVERT_HIGHPM" state="EFFECT_OFF" reversed="0">0</Effect>
The information presented in the MemoryNode and the MemoryRecord are useful for attempting to figure out what went wrong, but the key points are up in the ResourceMonitor element with the Notification, IndicatorsProcess, and IndicatorsSystem values. First, the notification tells us that this was RESOURCE_MEMPHYSICAL_LOW message that was captured in the ring buffers. That means that physical memory was low. The next two indicators let us know what was low. If the IndicatorsProcess returns 0 and the IndicatorsSystem returns a value then the problem was system wide. But, in our case the IndicatorsProcess has a value and IndicatorsSystem is returning 0. This means that this alert was for a single process that ran suffered from low memory, not the entire system. The values break down as follows:
High Physical Memory
Low Physical Memory
Low Virtual Memory
When you receive these, you now know that you’re looking at some degree of memory pressure within your system. Determining if it’s a process-only or a system-wide issue is helpful. Even if it’s process-only, you’re still looking at low memory on the system for the processes involved. That may mean that you’re low on memory, but it could also just mean that your processes need some rewriting.
Note: You get the indication that a process had memory issues, but it doesn’t tell you which process. You can capture these events using Extended Events and then marry them up with queries, but you can’t get directly from these messages to a particular system process.
Knowing that you have memory pressure from these alerts captured in the ring buffers is good, but you’re going to want to be able to capture this information programmatically. You’re not going to want to have to scroll through tons of data looking for the particular events that have occurred. Instead, let’s use XQuery to pull that information right out of the DMO:
AS (SELECT CAST(dorb.record AS XML) AS xRecord,
FROM sys.dm_os_ring_buffers AS dorb
WHERE dorb.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR'
SELECT xr.value('(ResourceMonitor/Notification)', 'varchar(75)') AS RmNotification,
xr.value('(ResourceMonitor/IndicatorsProcess)','tinyint') AS IndicatorsProcess,
xr.value('(ResourceMonitor/IndicatorsSystem)','tinyint') AS IndicatorsSystem,
DATEADD(ms, -1 * dosi.ms_ticks - rb.timestamp, GETDATE()) AS RmDateTime
FROM RingBuffer AS rb
CROSS APPLY rb.xRecord.nodes('Record') record (xr)
CROSS JOIN sys.dm_os_sys_info AS dosi
ORDER BY RmDateTime DESC;
This query shows the basic information available from the DMO and shows off one additional piece of information embedded in the results, the actual date and time derived from the timestamp value. If all you’re interested in is the fact of the memory issue, you can stop here. But if you’re interested in a little diagnostic work on top of this, you’re going to want to pull the rest of the memory information that’s available like this:
DATEADD(ms, -1 * dosi.ms_ticks - rb.timestamp, GETDATE()) AS RmDateTime,
xr.value('(MemoryNode/TargetMemory)','bigint') AS TargetMemory,
xr.value('(MemoryNode/ReserveMemory)','bigint') AS ReserveMemory,
xr.value('(MemoryNode/CommittedMemory)','bigint') AS CommitedMemory,
xr.value('(MemoryNode/SharedMemory)','bigint') AS SharedMemory,
xr.value('(MemoryNode/PagesMemory)','bigint') AS PagesMemory,
xr.value('(MemoryRecord/MemoryUtilization)','bigint') AS MemoryUtilization,
xr.value('(MemoryRecord/TotalPhysicalMemory)','bigint') AS TotalPhysicalMemory,
xr.value('(MemoryRecord/AvailablePhysicalMemory)','bigint') AS AvailablePhysicalMemory,
xr.value('(MemoryRecord/TotalPageFile)','bigint') AS TotalPageFile,
xr.value('(MemoryRecord/AvailablePageFile)','bigint') AS AvailablePageFile,
xr.value('(MemoryRecord/TotalVirtualAddressSpace)','bigint') AS TotalVirtualAddressSpace,
xr.value('(MemoryRecord/AvailableVirtualAddressSpace)','bigint') AS AvailableVirtualAddressSpace,
xr.value('(MemoryRecord/AvailableExtendedVirtualAddressSpace)','bigint') AS AvailableExtendedVirtualAddressSpace
Now the query includes all the available information from the XML. There is one exception. I left out AWE memory since most people are running 64 bit systems now and this value will always be zero. Now you information that will enable you to see where the memory issues are occurring.
As I said in the text, this won’t tell you where your memory issues are coming from, but using the sys.dm_os_ring_buffers DMO to look for the RING_BUFFER_RESOURCE_MONITOR event types will absolutely let you know that you are experiencing memory pressure. You can query this DMO directly as part of a monitoring solution, or you can capture the events using the resource_monitor_ring_buffer_recorded event. Either way, you’ll have to deal with the XML output. Hopefully these queries will help you track your memory issues in SQL Server.
For more information on how memory management works within SQL Server
For more detail on the RING_BUFFER_RESOURCE_MONITOR
Tips on improving performance by turning off ring buffer data collection
Grant Fritchey works for Red Gate Software as a Product Evangelist. Grant has 20 years experience in the industry as a DBA and developer. Grant is an active participant in the SQL Server Central and AskSQLServerCentral discussion forums. He writes articles for SQL Server Central and Simple-Talk. Grant is the author of the books "SQL Server 2012 Performance Tuning Distilled" and "SQL Server Execution Plans." He is co-author on the books "SQL Server Team-Based Development," "SQL Server MVP Deep Dives 2," and "Beginning SQL Server 2012 Administration." Grant blogs at http://scarydba.com
The MVP Monday Series is created by Melissa Travers. In this series we work to provide readers with a guest post from an MVP every Monday. Melissa is a Community Program Manager for Dynamics, Excel, Office 365, Platforms and SharePoint in the United States. She has been working with MVPs since her early days as Microsoft Exchange Support Engineer when MVPs would answer all the questions in the old newsgroups before she could get to them
Perfect timing on this post - I just started working with this DMO last week and was putting some queries together for it as well as learning about the information surrounding the topic. Ran into one problem with the queries in that "DATEADD(ms, -1 * dosi.ms_ticks - rb.timestamp, GETDATE()) AS RmDateTime" on my test system resulted in a value of -4402094259 for one of my servers, which is a bigint and incompatible with the DATEADD function. Simple enough to work around though by working in seconds instead of milliseconds. Thanks again!
Nice article, Grant;
Except for the "Msg 8115, Level 16, State 2, Line 1 Arithmetic overflow error converting expression to data type int." I like this article. The overflow error because dateadd only accepts int on my SQL2008R2 . I would have hoped ms fixed this with SQL2012, but according to bol it's still datatype int.
Jonathan Kehayias posted a similiar DMV query to this on the same day as you, here - www.sqlskills.com/.../jonathan
The way you and he calculate the event time are different.
You - DATEADD(ms, -1 * dosi.ms_ticks - rb.timestamp, GETDATE()) AS RmDateTime
JK - DATEADD (ss, (-1 * ((cpu_ticks / CONVERT (float, ( cpu_ticks / ms_ticks ))) - [timestamp])/1000), GETDATE()) AS EventTime,
His results look more accurate on my machine. I'm not trying to be clever, I'm a big fan of your books, just pointing it out. Maybe I'm confusing what RMDateTime represents.
And I tested & tested that date conversion because I did find multiple ways of doing it. Ah well, I'm not going to argue with Jonathan Kehayias on a technical point. I'll contact Microsoft and see if we can't modify that one bit of code. I'm glad the rest seems useful.
Thanks, very useful information.
Great information well written.
Just because I hate when this happens, someone else has passed this article off as their own:
Great article, inspired me to post a tutorial on XPath.
Useful post and community comments. Figured I’d note that one could, alternately, filter out values that exceed dateadd’s int requirement:
WHERE ( CONVERT( bigint, dosi.ms_ticks ) - CONVERT( bigint, rb.timestamp ) ) BETWEEN CONVERT(bigint, -2147483648 ) and CONVERT( bigint, 2147483647 )
Grant has a typo in the script:
should be written in such way
DATEADD(ms, -1 * (dosi.ms_ticks - rb.timestamp), GETDATE()) AS RmDateTime