Pushlocks were a new locking primitive first introduced in Windows Server 2003 and are primarily used in place of spinlocks to protect key kernel data structures. Unfortunately, Pushlocks are not documented in the WDK, and are not available for public use; however, a few internal drivers do use them, so you might see them while debugging a machine. Also, I was digging around on MSDN for Pushlocks, and I found that the Filter Manager does expose certain APIs to use Pushlocks, so you are in luck if you are developing a filter driver!
Pushlocks are built on primitive gate objects, which are defined by KGATE structures. The gate object is a highly optimized version of the basic event object. By not having both the notification and synchronization versions of the basic event object, and by being the exclusive object that a thread can wait on, the code for acquiring and releasing a gate is heavily optimized. Gates even have their own dispatcher lock instead of acquiring the entire dispatcher database.
Unlike spinlocks, which must be acquired exclusively for all operations on a data structure, pushlocks can be shared by multiple “readers” and need only be acquired exclusively when a thread needs to modify the data structure.
When a thread acquires a normal pushlock, the pushlock code marks the pushlock as owned, if it is not owned already. If someone owns the pushlock exclusively, or the thread wants to own the pushlock exclusively while someone else has it in shared mode, the thread allocates a wait block on its stack, initializes a gate object in the wait block, and then add the wait block to the wait list associated with the pushlock. When the thread holding the pushlock finally releases it, it wakes the next waiter by signaling the event in the waiters wait block.
When debugging a machine, there is no easy way to figure out the current owner of the pushlock, apart from doing code review. By looking at the waitlist, you can always figure out the threads trying to get access to it, but since the gate does not keep track of the owner like a regular mutex, it is much harder to find the current owner.
For more details on the operation and structure of a pushlock, please review the Pushlocks section in Windows Internals book, under the System Mechanisms Chapter.
If a pushlock is held by one or more readers, threads that want to modify the data structure are queued for exclusive access. This queuing mechanism provides some of the same benefits of queued spinlocks—for example, FIFO ordering, elimination of race conditions, and reduced cache thrashing when more than one thread is waiting for the pushlock.
Another advantage of using a pushlock is the size. A regular resource object is 56 bytes, however a pushlock is the size of a pointer. Apart from a small memory footprint, this helps especially in the non-contended case, since pushlocks do not require lengthy operations to perform acquisition or release. Because the pushlock fits in one “machine word”, the CPU can use atomic operations to compare and exchange the old lock with the new one.
Pushlocks are also self-optimizing in the sense that the list of threads waiting on a pushlock will be periodically rearranged to provide fairer behavior when the pushlock is released.
A cache-aware pushlock adds to the basic pushlock by allocating a normal pushlock for each processor in the system and associating it with the cache-aware pushlock. When a thread wants to acquire a cache-aware pushlock for shared access, it simply acquires the pushlock on that processor; however if it needs to acquire the lock for exclusive access, it has to acquire the pushlocks for each processor in exclusive mode.
3: kd> !thread 8c9764c0
THREAD 8c9764c0 Cid 2410.1be4 Teb: 7ff9f000 Win32Thread: e5c6f298 GATEWAIT
Stack Init b386b000 Current b386a978 Base b386b000 Limit b3867000 Call 0
ChildEBP RetAddr Args to Child
b386a990 80833485 8c9764c0 8c9764e4 00000003 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b386a9bc 8082ffe0 b06a6a03 e11e0b18 b386aa54 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo]) (CONV: fastcall)
b386a9e4 8087d722 00000000 e11e0b08 e11e0b18 nt!KeWaitForGate+0x152 (FPO: [Non-Fpo]) (CONV: fastcall)
e11e0b18 00000000 0c050204 7346744e e37b2808 nt!ExfAcquirePushLockExclusive+0x112 (FPO: [Non-Fpo]) (CONV: fastcall)
Above is a snipped output from a dump that I was recently looking at. From the stack, you can see the ExfAcquirePushLockExclusive call trying to acquire the pushlock, which then calls KEWaitForGate. In this case, the lock was already acquired, so this thread allocated a wait block on its stack, and then added itself to the waitlist.
Also, the stack is broken due to the fastcall, therefore the debugger cannot display it entirely. So we can manually reconstruct the stack by passing parameters to the kb command.
k[b|p|P|v] = BasePtr StackPtr InstructionPtr
To get the arguments, we first dump the stack manually using the dps command with the current esp.
3: kd> dps b386a978 l50
b386a980 8088dafe nt!KiSwapContext+0x26
b386a994 80833485 nt!KiSwapThread+0x2e5
b386a9c0 8082ffe0 nt!KeWaitForGate+0x152
b386a9e8 8087d722 nt!ExfAcquirePushLockExclusive+0x112
b386a9fc 8096e9a9 nt!SeOpenObjectAuditAlarm+0x1cf
b386aa24 80a60456 hal!KfLowerIrql+0x62
b386aa44 808b7a14 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a
From the output above, we can see the stack. To reconstruct the stack, we can get the ebp, esp, and eip from the stack for the ExfAcquirePushLockExclusive frame, and pass it to the kb command. Voila!
3: kd> kb = b386aa40 b386a9e4 8087d722
b386aa40 808b7a14 b386ac04 e11e0b18 e11e0b18 nt!ExfAcquirePushLockExclusive+0x112
b386aa64 808b7b09 e11e0b18 b386aa80 e101bf40 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a
b386aaa4 808da118 0000001c b386ab58 00000001 nt!CmpCallCallBacks+0x6b
b386ab90 80937942 e101bf40 00000000 89f13648 nt!CmpParseKey+0xd4
b386ac10 80933a76 00000000 b386ac50 00000040 nt!ObpLookupObjectName+0x5b0
b386ac64 808bb471 00000000 8e930480 00000d01 nt!ObOpenObjectByName+0xea
b386ad50 808897bc 0243eba0 00020019 0243eb68 nt!NtOpenKey+0x1ad
b386ad50 7c8285ec 0243eba0 00020019 0243eb68 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0243eba4 00000000 00000000 00000000 00000000 0x7c8285ec
Share this post :