Hi, it's the Debug Ninja back again with another debugging adventure. Recently I have encountered several instances where processes fail to initialize, and a review of available resources showed that there was no obvious resource exhaustion. A more in depth review found that there were no available string atoms in the global atom table.
Global atoms are organized on a per-session basis. If atoms cannot be allocated in session 0, services may fail to start or processes launched by various services may fail to start. However, a user logged in to a different session will not experience any such failures.
String atoms are numbered from 0xC000 through 0xFFFF, providing a maximum of 0x4000 atoms per session. For more information on atoms, and atom tables, see http://technet.microsoft.com/en-us/query/ms649053.
When there are no more string atoms available, calls to APIs that allocate string atoms will fail. Because atoms are often allocated at process or dll init time, the most common symptom is that processes fail to initialize. The process may cleanly exit without an error. You are likely experiencing this problem if you debug your application and find that the failure originates from an API that allocates string atoms such as RegisterClass, RegisterClassEx, GlobalAddAtom, or AddAtom.
To determine if the global string atom table is full you will need to perform a kernel debug. This can be a live debug or a post-mortem debug using a dump.
First identify the session where the failures have occurred and set the process context to a process in this session. In my example, w3wp.exe was launching a process and this process failed to initialize.
2: kd> !process 0 0 w3wp.exe
SessionId: 0 Cid: 1668 Peb: fffdf000 ParentCid: 08ec
DirBase: 8a2df000 ObjectTable: fffff8a0128bbe40 HandleCount: 441.
2: kd> .process /p /r fffffa8005083060
Implicit process is now fffffa80`05083060
Loading User Symbols
Next we need to analyze the global atom table. The pointer to the table is stored in the UserAtomTableHandle global.
2: kd> dq win32k!UserAtomTableHandle l1
The UserAtomTableHandle has a pointer to a handle table at offset 0x10 in 64-bit, and offset 0x8 in 32-bit. Note that although the atom table is defined as a _RTL_ATOM_TABLE, the format shown by dt is for user mode and does not apply to the UserAtomTableHandle in kernel mode.
2: kd> dq fffff8a0`05e5bc70+10 l1
2: kd> dt nt!_HANDLE_TABLE fffff8a0`05db7740
+0x000 TableCode : 0xfffff8a0`109c8001
+0x008 QuotaProcess : (null)
+0x010 UniqueProcessId : 0x00000000`00000184 Void
+0x018 HandleLock : _EX_PUSH_LOCK
+0x020 HandleTableList : _LIST_ENTRY [ 0xfffff8a0`05db7760 - 0xfffff8a0`05db7760 ]
+0x030 HandleContentionEvent : _EX_PUSH_LOCK
+0x038 DebugInfo : (null)
+0x040 ExtraInfoPages : 0n0
+0x044 Flags : 0
+0x044 StrictFIFO : 0y0
+0x048 FirstFreeHandle : 0x10004
+0x050 LastFreeHandleEntry : 0xfffff8a0`10ca4ff0 _HANDLE_TABLE_ENTRY
+0x058 HandleCount : 0x3fc0
+0x05c NextHandleNeedingPool : 0x10400
+0x060 HandleCountHighWatermark : 0x3fc1
The FirstFreeHandle contains the handle number that will be given to the next handle allocated from this table. This value is encoded, to get the next handle number we need to right shift the FirstFreeHandle by 2 bits.
2: kd> ?00010004>>2
Evaluate expression: 16385 = 00000000`00004001
The result from above, 0x4001, is greater than the number of possible string atoms. As I mentioned earlier, there is a limit of 0x4000 string atoms. Now we know that the session is out of string atoms.
The next step is to dump the string atoms to identify whether there is an observable pattern in the leaked strings. The !atom command only works in user mode, so we need to dump the kernel mode strings manually. An atom table is comprised of multiple buckets. Each bucket is the head of a list of atoms. The buckets start at offset 0x20 in the atom table in 64-bit, and offset 0x10 in 32-bit.
2: kd> dq fffff8a0`05e5bc70+20
fffff8a0`05e5bc90 fffff8a0`05e5ba60 fffff8a0`05db7be0
fffff8a0`05e5bca0 fffff8a0`08cf1770 fffff8a0`05e5b3d0
fffff8a0`05e5bcb0 fffff8a0`05ea9020 fffff8a0`05e5b8e0
fffff8a0`05e5bcc0 fffff8a0`05ea9b10 fffff8a0`05ea9910
fffff8a0`05e5bcd0 fffff8a0`05ea9f00 fffff8a0`05e5b650
fffff8a0`05e5bce0 fffff8a0`05cda290 fffff8a0`05ea9e80
fffff8a0`05e5bcf0 fffff8a0`05e5b200 fffff8a0`05ea9e30
fffff8a0`05e5bd00 fffff8a0`05e5b7e0 fffff8a0`06c56210
2: kd> dq
fffff8a0`05e5bd10 fffff8a0`06d6b5a0 fffff8a0`05ea9d50
fffff8a0`05e5bd20 fffff8a0`05e5b790 fffff8a0`05e5b9d0
fffff8a0`05e5bd30 fffff8a0`06bd9bc0 fffff8a0`05ea9c90
fffff8a0`05e5bd40 fffff8a0`05e5b0c0 fffff8a0`06ae2020
fffff8a0`05e5bd50 fffff8a0`05e5b930 fffff8a0`04d2af40
fffff8a0`05e5bd60 fffff8a0`05e5b690 fffff8a0`05e5b980
fffff8a0`05e5bd70 fffff8a0`05e5b490 fffff8a0`05e5b410
fffff8a0`05e5bd80 fffff8a0`05e5ba20 fffff8a0`05e5b4f0
fffff8a0`05e5bd90 fffff8a0`05e5baa0 fffff8a0`05e5b390
fffff8a0`05e5bda0 fffff8a0`05e5b840 fffff8a0`05ea9c50
fffff8a0`05e5bdb0 fffff8a0`05e5b250 00000000`00000000
fffff8a0`05e5bdc0 00000000`00000000 00000000`00000000
fffff8a0`05e5bdd0 00000000`00000000 00000000`00000000
fffff8a0`05e5bde0 00000000`00000000 00000000`00000000
fffff8a0`05e5bdf0 00000000`00000000 00000000`00000000
fffff8a0`05e5be00 00000000`00000000 00000000`00000000
The quick and dirty way to dump the buckets is with !list. I am sure that some will say it is tedious to dump each bucket list by hand and that there are easier ways to accomplish this. To prevent this article from becoming a lesson on debugger scripting, I am leaving that as an exercise to the reader.
2: kd> !list "-t nt!_RTL_ATOM_TABLE_ENTRY.HashLink -e -x \"du @$extret+10\" fffff8a0`05e5ba60"
<snip strings that don't match a pattern>
Dumping the atoms I found that there is a continuous pattern of the string ControlOfs followed by 16 hexadecimal numbers. Some time spent with your favorite search engine should find other reports of atom leaks involving the string ControlOfs, and that these leaks have been identified as a problem in some specific software. In this instance the programmer using that software needs to change their application to avoid the problem.
Posts like this are giving me hard slap on the back when i think that i know a bit or two about possible resource exhaustions. Thanks Ninja. Out of curiosity, this was on 2003 or 2008/R2 box?
[Don't slap yourself too hard, I don't think many folks knew how to debug this. That's why I wrote it up, to increase awareness. The specific example shown here was from Windows Server 2008 R2 SP1, but the concept is the same across platforms. These steps should work equally well on Server 2003, 2008, etc.]
I hope that you file a bug report internally too. Stuff like this should either be made visible to the user (message box, event log, ...) or there shouldn't be a limit in the first place. The real problem is not that 3rd party software is buggy, the big problem is that 3rd party software can render the OS unusable.
[Hi deiruch, We are discussing internally how to make this more discoverable.
It is not necessarily an OS fault that an application can exhaust this resource. There are many mechanisms an application can abuse to impact the performance of other applications. In this instance the documentation for GlobalAddAtom states that for every call there must be a corresponding call to GlobalDeleteAtom, to not do so is a bug in the application.]
I think you forgot to mention that RegisterWindowMessages is as well including an atom which is never released. That is why atom is being depleted so fast when we call RWM several times. Do you know then how to delete the atom leaked when using RWM?
[Jordi, RegisterWindowMessage does not provide a mechanism to unregister the message. The Windows Development Reference for RegisterWindowMessage does not directly say that it can exhaust the global atom table; however, it does inform a programmer that there are a limited number of message identifiers and that the message remains registered until the session ends. Samples which include this API use a pre-defined string, which increments a reference count rather than continuously creating new atoms.]
When the atom table is depleted it returns the error "System Error. Code: 8. Not enough storage is available to process this command".
Do you know if CreateProcess (msdn.microsoft.com/.../ms682425(v=vs.85).aspx) and CoCreateInstance (msdn.microsoft.com/.../ms686615(v=vs.85).aspx) functions generate atoms as well?.
[Hi Jordi. CreateProcess does not create an atom from the global atom table discussed here, although the process which is created may do so. CoCreateInstance doesn't use atoms from the table discussed in this article, although ITfCategoryMgr may use a different type of atom.]
Hi There -
I am debugging an issue with global atom leaks. I have a dump on our test computers that has an extremely similar set of leaked global atoms. I followed the procedure in this article and dumped out thousands of ControlOfs atoms. All signs on the Iternet point to this as bug in Delphi projects. However, we are using vanilla MSVC + QT. I can't find any reference to controLOfs in our source base, the QT source base, or any of the binaries we use. Can you shed any light in the third party software that causes this problem?
[I am only aware of ControlOfs atoms being allocated by the source you mentioned. Keep in mind that atoms may be allocated by any software running on the system, not necessarily your application. If you look at some of the documetation around the use of ControlOfs and atoms, you'll see that one of the numbers is the PID and one is the TID. If you track PIDs over time you may be able to match up a PID to a process name. One way to do this is to audit process tracking, and view the event 592 messages to find the PID you are interested in.]
[UPDATE: Sorry, I was wrong about the ControlOfs atom containing the PID. It contains the TID. You can use procmon with a filter on ThreadCreate to get a historical list of TIDs. There are likely other methods to get a list of thread IDs, procmon is the first one that came to mind.]