• Ntdebugging Blog

    Identifying Global Atom Table Leaks

    • 6 Comments

    Hi, it's the Debug Ninja back again with another debugging adventure.  Recently I have encountered several instances where processes fail to initialize, and a review of available resources showed that there was no obvious resource exhaustion.  A more in depth review found that there were no available string atoms in the global atom table.

     

    Global atoms are organized on a per-session basis.  If atoms cannot be allocated in session 0, services may fail to start or processes launched by various services may fail to start.  However, a user logged in to a different session will not experience any such failures.

     

    String atoms are numbered from 0xC000 through 0xFFFF, providing a maximum of 0x4000 atoms per session.  For more information on atoms, and atom tables, see http://technet.microsoft.com/en-us/query/ms649053.

     

    When there are no more string atoms available, calls to APIs that allocate string atoms will fail.  Because atoms are often allocated at process or dll init time, the most common symptom is that processes fail to initialize.  The process may cleanly exit without an error.  You are likely experiencing this problem if you debug your application and find that the failure originates from an API that allocates string atoms such as RegisterClass, RegisterClassEx, GlobalAddAtom, or AddAtom.

     

    To determine if the global string atom table is full you will need to perform a kernel debug.  This can be a live debug or a post-mortem debug using a dump.

     

    First identify the session where the failures have occurred and set the process context to a process in this session.  In my example, w3wp.exe was launching a process and this process failed to initialize.

     

    2: kd> !process 0 0 w3wp.exe

    PROCESS fffffa8005083060

        SessionId: 0  Cid: 1668    Peb: fffdf000  ParentCid: 08ec

        DirBase: 8a2df000  ObjectTable: fffff8a0128bbe40  HandleCount: 441.

        Image: w3wp.exe

    2: kd> .process /p /r fffffa8005083060

    Implicit process is now fffffa80`05083060

    Loading User Symbols

    .....

     

    Next we need to analyze the global atom table.  The pointer to the table is stored in the UserAtomTableHandle global.

     

    2: kd> dq win32k!UserAtomTableHandle l1

    fffff960`003bf7a8  fffff8a0`05e5bc70

     

    The UserAtomTableHandle has a pointer to a handle table at offset 0x10 in 64-bit, and offset 0x8 in 32-bit.  Note that although the atom table is defined as a _RTL_ATOM_TABLE, the format shown by dt is for user mode and does not apply to the UserAtomTableHandle in kernel mode.

     

    2: kd> dq fffff8a0`05e5bc70+10 l1

    fffff8a0`05e5bc80  fffff8a0`05db7740

    2: kd> dt nt!_HANDLE_TABLE fffff8a0`05db7740

       +0x000 TableCode        : 0xfffff8a0`109c8001

       +0x008 QuotaProcess     : (null)

       +0x010 UniqueProcessId  : 0x00000000`00000184 Void

       +0x018 HandleLock       : _EX_PUSH_LOCK

       +0x020 HandleTableList  : _LIST_ENTRY [ 0xfffff8a0`05db7760 - 0xfffff8a0`05db7760 ]

       +0x030 HandleContentionEvent : _EX_PUSH_LOCK

       +0x038 DebugInfo        : (null)

       +0x040 ExtraInfoPages   : 0n0

       +0x044 Flags            : 0

       +0x044 StrictFIFO       : 0y0

       +0x048 FirstFreeHandle  : 0x10004

       +0x050 LastFreeHandleEntry : 0xfffff8a0`10ca4ff0 _HANDLE_TABLE_ENTRY

       +0x058 HandleCount      : 0x3fc0

       +0x05c NextHandleNeedingPool : 0x10400

       +0x060 HandleCountHighWatermark : 0x3fc1

     

    The FirstFreeHandle contains the handle number that will be given to the next handle allocated from this table.  This value is encoded, to get the next handle number we need to right shift the FirstFreeHandle by 2 bits.

     

    2: kd> ?00010004>>2

    Evaluate expression: 16385 = 00000000`00004001

     

    The result from above, 0x4001, is greater than the number of possible string atoms.  As I mentioned earlier, there is a limit of 0x4000 string atoms.  Now we know that the session is out of string atoms.

     

    The next step is to dump the string atoms to identify whether there is an observable pattern in the leaked strings.  The !atom command only works in user mode, so we need to dump the kernel mode strings manually.  An atom table is comprised of multiple buckets.   Each bucket is the head of a list of atoms.  The buckets start at offset 0x20 in the atom table in 64-bit, and offset 0x10 in 32-bit.

     

    2: kd> dq fffff8a0`05e5bc70+20

    fffff8a0`05e5bc90  fffff8a0`05e5ba60 fffff8a0`05db7be0

    fffff8a0`05e5bca0  fffff8a0`08cf1770 fffff8a0`05e5b3d0

    fffff8a0`05e5bcb0  fffff8a0`05ea9020 fffff8a0`05e5b8e0

    fffff8a0`05e5bcc0  fffff8a0`05ea9b10 fffff8a0`05ea9910

    fffff8a0`05e5bcd0  fffff8a0`05ea9f00 fffff8a0`05e5b650

    fffff8a0`05e5bce0  fffff8a0`05cda290 fffff8a0`05ea9e80

    fffff8a0`05e5bcf0  fffff8a0`05e5b200 fffff8a0`05ea9e30

    fffff8a0`05e5bd00  fffff8a0`05e5b7e0 fffff8a0`06c56210

    2: kd> dq

    fffff8a0`05e5bd10  fffff8a0`06d6b5a0 fffff8a0`05ea9d50

    fffff8a0`05e5bd20  fffff8a0`05e5b790 fffff8a0`05e5b9d0

    fffff8a0`05e5bd30  fffff8a0`06bd9bc0 fffff8a0`05ea9c90

    fffff8a0`05e5bd40  fffff8a0`05e5b0c0 fffff8a0`06ae2020

    fffff8a0`05e5bd50  fffff8a0`05e5b930 fffff8a0`04d2af40

    fffff8a0`05e5bd60  fffff8a0`05e5b690 fffff8a0`05e5b980

    fffff8a0`05e5bd70  fffff8a0`05e5b490 fffff8a0`05e5b410

    fffff8a0`05e5bd80  fffff8a0`05e5ba20 fffff8a0`05e5b4f0

    2: kd> dq

    fffff8a0`05e5bd90  fffff8a0`05e5baa0 fffff8a0`05e5b390

    fffff8a0`05e5bda0  fffff8a0`05e5b840 fffff8a0`05ea9c50

    fffff8a0`05e5bdb0  fffff8a0`05e5b250 00000000`00000000

    fffff8a0`05e5bdc0  00000000`00000000 00000000`00000000

    fffff8a0`05e5bdd0  00000000`00000000 00000000`00000000

    fffff8a0`05e5bde0  00000000`00000000 00000000`00000000

    fffff8a0`05e5bdf0  00000000`00000000 00000000`00000000

    fffff8a0`05e5be00  00000000`00000000 00000000`00000000

     

    The quick and dirty way to dump the buckets is with !list.  I am sure that some will say it is tedious to dump each bucket list by hand and that there are easier ways to accomplish this.  To prevent this article from becoming a lesson on debugger scripting, I am leaving that as an exercise to the reader.

     

    2: kd> !list "-t nt!_RTL_ATOM_TABLE_ENTRY.HashLink -e -x \"du @$extret+10\" fffff8a0`05e5ba60"

    du @$extret+10

    fffff8a0`05e5ba70  "Native"

     

    <snip strings that don't match a pattern>

     

    du @$extret+10

    fffff8a0`0838a120  "ControlOfs0210000000000700"

     

    du @$extret+10

    fffff8a0`0f7ff430  "ControlOfs021A000000000C30"

     

    du @$extret+10

    fffff8a0`162168c0  "ControlOfs020E000000001774"

     

    du @$extret+10

    fffff8a0`08c33870  "ControlOfs01F70000000007F4"

     

    du @$extret+10

    fffff8a0`07c46910  "ControlOfs0202000000000BF8"

     

    du @$extret+10

    fffff8a0`062aab50  "ControlOfs01F5000000001274"

     

    du @$extret+10

    fffff8a0`0777b150  "ControlOfs0202000000000C80"

     

    du @$extret+10

    fffff8a0`07dd3410  "ControlOfs0207000000000F00"

     

    du @$extret+10

    fffff8a0`0f01d190  "ControlOfs0214000000000DAC"

     

    Dumping the atoms I found that there is a continuous pattern of the string ControlOfs followed by 16 hexadecimal numbers.  Some time spent with your favorite search engine should find other reports of atom leaks involving the string ControlOfs, and that these leaks have been identified as a problem in some specific software.  In this instance the programmer using that software needs to change their application to avoid the problem.

  • Ntdebugging Blog

    Stop 0x19 in a Large Pool Allocation

    • 2 Comments

    Hello all, Scott Olson here again to share another interesting issue I recently debugged with pool corruption and found that using special pool does not work with large pool allocations (pool allocations greater than a PAGE_SIZE).

     

    Here is an example of a valid large page allocation. Notice the size is 0x1fb0 and a PAGE_SIZE is 0x1000 or 4kb.

     

    0: kd> !pool fffffa80`0dba6fa0

    Pool page fffffa800dba6fa0 region is Nonpaged pool

    *fffffa800dba5000 : large page allocation, Tag is Io  , size is 0x1fb0 bytes

                    Pooltag Io   : general IO allocations, Binary : nt!io

     

    In Windows 7, at the end of the large pool allocation it will have an allocation tag of “Frag” then a “Free” tag with the rest of the page size and is stored on the free pool list for allocation less than a page in size.

     

    0: kd> dc fffffa800dba5000 fffffa800dba5000+0x1fb0-4

    fffffa80`0dba5000  00558001 32373242 00000000 00000000  ..U.B272........

    fffffa80`0dba5010  55555555 55555555 98764321 01b75f55  UUUUUUUU!Cv.U_..

    fffffa80`0dba5020  00000001 00000001 704e6ff0 fffff981  .........oNp....

    …<cut>

    fffffa80`0dba6f80  55555555 55555555 55555555 55555555  UUUUUUUUUUUUUUUU

    fffffa80`0dba6f90  55555555 55555555 55555555 55555555  UUUUUUUUUUUUUUUU

    fffffa80`0dba6fa0  55555555 55555555 00001fb0 00000000  UUUUUUUU........

    0: kd> dc

    fffffa80`0dba6fb0  02010100 67617246 55555555 55555555  ....FragUUUUUUUU

    fffffa80`0dba6fc0  00040101 65657246 55555555 55555555  ....FreeUUUUUUUU

    fffffa80`0dba6fd0  00802170 fffff880 0e49cf70 fffffa80  p!......p.I.....

    fffffa80`0dba6fe0  15cc8fe8 fffff981 3b9c50a7 00000005  .........P.;....


    Displayed with the !pool command:

    0: kd> !pool fffffa80`0dba6fb0

    Pool page fffffa800dba6fb0 region is Nonpaged pool

    *fffffa800dba6fb0 size:   10 previous size:    0  (Allocated) *Frag

                    Owning component : Unknown (update pooltag.txt)

     fffffa800dba6fc0 size:   40 previous size:   10  (Free)       Free

     

    The example above demonstrates how this normally works.  The downside to this architecture is that if a driver were to overrun its pool allocation then special pool would not be useful because the large pool allocation has to be page-aligned. Special pool detects pool overruns by putting the data at the end of the page, which would not be feasible with a large pool allocation.

     

    In Windows 7 there is a check while freeing the pool memory that will determine if this allocation had written past the end of its allocation, and if so will bug check the machine with a Stop 0x19 BAD_POOL_HEADER with the first parameter being a 0x21.  Here is the definition along with what each parameter means:

     

    BAD_POOL_HEADER (19)

    The pool is already corrupt at the time of the current request.

    This may or may not be due to the caller.

    The internal pool links must be walked to figure out a possible cause of

    the problem, and then special pool applied to the suspect tags or the driver

    verifier to a suspect driver.

    Arguments:

    Arg1: 0000000000000021, the data following the pool block being freed is corrupt.  Typically this means the consumer (call stack ) has overrun the block.

    Arg2: fffffa800dc57000, The pool pointer being freed.

    Arg3: 0000000000002180, The number of bytes allocated for the pool block.

    Arg4: 006b0072006f0077, The corrupted value found following the pool block.

     

    Here is an example of what this corruption looks like compared to the above valid large pool allocation:

    0: kd> !pool fffffa800dc57000

    Pool page fffffa800dc57000 region is Nonpaged pool

    fffffa800dc57000 is not a valid large pool allocation, checking large session pool...

    fffffa800dc57000 is freed (or corrupt) pool

    Bad allocation size @fffffa800dc57000, zero is invalid

     

    ***

    *** An error (or corruption) in the pool was detected;

    *** Attempting to diagnose the problem.

    ***

    *** Use !poolval fffffa800dc57000 for more details.

     

     

    Pool page [ fffffa800dc57000 ] is __inVALID.

     

    Analyzing linked list...

    [ fffffa800dc57000 ]: invalid previous size [ 0x38 ] should be [ 0x0 ]

     

     

    Scanning for single bit errors...

     

    None found

     

    Next, I dump the allocation from the start to the end.  Notice the size of the allocation is stored in the bugcheck code as argument 3.

     

    0: kd> dc fffffa800dc57000 fffffa800dc57000+2180-4

    fffffa80`0dc57000  00000038 0000000e 00000000 00000000  8...............

    fffffa80`0dc57010  a24da497 01ccc5d6 c827993c 41946d1f  ..M.....<.'..m.A

    fffffa80`0dc57020  c0d75c9b b7cff1a5 00000000 00000020  .\.......... ...

    fffffa80`0dc57030  000021e0 00000006 0000006c 00000110  .!......l.......

    fffffa80`0dc57040  00000208 000003b8 00000208 00000660  ............`...

    fffffa80`0dc57050  00000208 00000910 00000208 00000bb0  ................

    <cut>

    fffffa80`0dc59150  002d0033 00300031 0063002e 006d006f  3.-.1.0...c.o.m.

    fffffa80`0dc59160  006c002e 00660065 00680074 006e0061  ..l.e.f.t.h.a.n.

    fffffa80`0dc59170  006e0064 00740065 006f0077 006b0072  d.n.e.t.w.o.r.k.

     

    This should be the end of the allocation.  The next thing we see should be the “Frag” and “Free” tags.

     

    0: kd> dc

    fffffa80`0dc59180  003a0073 0061006d 0061006e 00650067  s.:.m.a.n.a.g.e.

    fffffa80`0dc59190  0065006d 0074006e 0038003a 00390036  m.e.n.t.:.8.6.9.

    fffffa80`0dc591a0  0062003a 00670069 0075006c 00790063  :.b.i.g.l.u.c.y.

    fffffa80`0dc591b0  0064002d 00740061 002d0061 006e0069  -.d.a.t.a.-.i.n.

    fffffa80`0dc591c0  00650064 00650078 002d0073 00740063  d.e.x.e.s.-.c.t.

    fffffa80`0dc591d0  006c0072 0031005f 00000031 00000000  r.l._.1.1.......

     

    We clearly see that the Frag and Free tag have been overwritten with some string value which is causing the corruption.  At this point, you would need to look at the current stack to determine which driver had allocated the memory, and review the code to investigate when this corruption could have occurred.

Page 1 of 1 (2 items)