Holy cow, I wrote a book!
Commenter Niels wonders when and how the registry was introduced to 16-bit Windows and how much of it carried over to Windows 95.
The 16-bit registry was extremely simple. There were just keys, no values. The only hive was HKEY_CLASSES_ROOT. All it was used for was COM objects and file associations. The registry was stored in the REG.DAT file, and its maximum size was 64KB.
HKEY_CLASSES_ROOT
REG.DAT
It is my recollection that the registry was introduced in Windows 3.1, but Niels says it's not in a plain vanilla install, so I guess my memory is faulty.
None of the 16-bit registry code was carried over to Windows 95. Windows 95 extended the registry into kernel mode, added support for values and non-string data types, increased the maximum registry size (though if some people are to be believed, not by enough), and added a bunch of other hives, like added the HKEY_CURRENT_USER, HKEY_LOCAL_MACHINE, and the HKEY_DYN_DATA, The old 16-bit registry code was woefully inadequate for all these new requirements (especially the kernel mode part), so it was all thrown out and a brand new registry written.
HKEY_CURRENT_USER
HKEY_LOCAL_MACHINE
HKEY_DYN_DATA
In the early days of the Windows 95 registry, the in-memory signature value to identify the data structures which represent an open registry key were four bytes which corresponded to the ASCII values for the initials of the two programmers who wrote it.
Last time we saw how to view the stack of threads that were terminated as part of process teardown from the kernel debugger. You can do the same thing from a user-mode debugger, and it's actually a bit easier there. (The user-mode debugger I'm using is the one that comes with the Debugging Tools for Windows, the debugging engine that goes by a number of different front-ends, such as ntsd, cdb, and windbg.)
ntsd
cdb
windbg
A direct translation of the kernel-mode technique from last time would involve using the !vadump command and picking through for the memory blocks with candidate size and attributes. But there's an easier way.
!vadump
Now would be a good point for me to remind you that this information is for debugging purposes only. The structures and offsets are all implementation details which can change from release to release.
Recall that the TEB begins with some pointers which bound the stack, and the seventh pointer is a self-pointer. What's even more useful is the thirteenth pointer (offset 0x30 for 32-bit TEBs, offset 0x60 for 64-bit TEBs), because that is where the PEB is stored.
Each process has a single global PEB, so all the TEBs will have the same PEB value at offset 0x30/0x60. And you can figure out the address of the current process's PEB either by using the !peb command or by simply looking at the TEB you already have.
0:000> dd fs:30 l1 0053:00000030 7efde000
Now you can search through memory looking for that value. If you see any hits at offset 0x30/0x60, then that's a candidate TEB.
The debugger normally limits memory scans to 256MB.
0:001> s 00000000 L 80000000 00 e0 fd 7e ^ Range error in 's 00000000 l 80000000 00 e0 fd 7e'
Therefore, you have to issue the search eight times (for 32-bit processes) to cover the 2GB user-mode address space.
0:001> s 00000000 L 10000000 00 e0 fd 7e 0009e01c 00 e0 fd 7e 00 d0 fd 7e-44 e0 09 00 7b ef 17 77 ...~...~D...{..w 0009fdc0 00 e0 fd 7e 44 00 00 00-f0 ee 3a 00 10 ef 3a 00 ...~D.....:...:. 0009fe34 00 e0 fd 7e 78 fe 09 00-02 9f 18 77 00 e0 fd 7e ...~x......w...~ 0:001> s 10000000 L 10000000 00 e0 fd 7e 0:001> s 20000000 L 10000000 00 e0 fd 7e 0:001> s 30000000 L 10000000 00 e0 fd 7e 0:001> s 40000000 L 10000000 00 e0 fd 7e 0:001> s 50000000 L 10000000 00 e0 fd 7e 0:001> s 60000000 L 10000000 00 e0 fd 7e 0:001> s 70000000 L 10000000 00 e0 fd 7e 7486af70 00 e0 fd 7e 00 00 00 00-b8 00 16 77 28 00 16 77 ...~.......w(..w 7efda030 00 e0 fd 7e 00 00 00 00-00 00 00 00 00 00 00 00 ...~............ 7efdd030 00 e0 fd 7e 00 00 00 00-00 00 00 00 00 00 00 00 ...~............
Alternatively, you can use the "length sanity check override" by inserting a question mark after the L:
0:001> s 00000000 L?80000000 00 e0 fd 7e 0009e01c 00 e0 fd 7e 00 d0 fd 7e-44 e0 09 00 7b ef 17 77 ...~...~D...{..w 0009fdc0 00 e0 fd 7e 44 00 00 00-f0 ee 3a 00 10 ef 3a 00 ...~D.....:...:. 0009fe34 00 e0 fd 7e 78 fe 09 00-02 9f 18 77 00 e0 fd 7e ...~x......w...~ 7486af70 00 e0 fd 7e 00 00 00 00-b8 00 16 77 28 00 16 77 ...~.......w(..w 7efda030 00 e0 fd 7e 00 00 00 00-00 00 00 00 00 00 00 00 ...~............ 7efdd030 00 e0 fd 7e 00 00 00 00-00 00 00 00 00 00 00 00 ...~............
From the above output, we see that we can quickly reject all but the last two entries because the offset within the page is not the magic value 0x30. (This is a 32-bit process.) Hooray, two debugger commands reduce the search space to just two pages!
At this point, you can continue with the debugging technique from last time, looking at each candidate TEB to see if there's a valid stack in there.
Back in the day (and perhaps still true today), Charles Petzold's Programming Windows was the definitive source for learning to program Windows. The book is so old that even I used it to learn Windows programming, back when everything was 16-bit and uphill both ways. The most recent edition is Programming Windows, 5th Edition, which was published way back in 1998. What has he been doing since then? My guess would have been "sitting on a beach in Hawaiʻi," but apparently he's been writing books on C# and Windows Forms and WPF and Silverlight. Hey, I could still be right: Maybe he writes the books while sitting on a beach in Hawaiʻi.
It appears that Windows 8 has brought Mr. Petzold back to the topic of Windows progarmming, and despite his earlier claims that he has no plans to write a sixth edition of Programming Windows, it turns out that he's writing a sixth edition of Programming Windows specifically for Windows 8. (Perhaps he could subtitle his book The New Old Thing.)
Here's where it gets interesting.
Before the book officially releases (target date November 15), there will be two pre-release versions in eBook form, one based on the Consumer Preview of Windows 8 and one based on the Release Preview.
Now it gets really interesting: If you order the Consumer Preview eBook, it comes with free upgrades to the Release Preview eBook as well as the final eBook. (If you order the Release Preview eBook, then it comes with a free upgrade to the final eBook.)
Can it get even more interesting than that? You bet! Because the price of getting in on the action increases the longer you wait. Act now, and you can get the Consumer Preview eBook (and all the free upgrades that come with it) for just $10. Wait a few weeks, and it'll cost you $20. Wait another few months, and it'll cost you $30; after another few weeks the price goes up to $40, and if you are a lazy bum and wait until the final eBook to be released, it'll cost you $50.
But in order to take advantage of this offer, you have to follow the instructions on this blog entry from Microsoft Press (and read the mandatory legal mumbo-jumbo, because the lawyers always get their say).
Bonus chatter: One publisher asked me if I wanted to write a book on programming Windows 8, but I told them that I was too busy shipping Windows 8 to have any extra time to write a book about it. And it's a good thing I turned them down, because imagine if I decided to write the book and found that Charles Petzold was coming out of retirement to write his own book. My book would have done even worse than my first book, which didn't even have any competition!
Bonus disclaimer: Charles Petzold did not pay me to write this, nor did he offer me a cut of his royalties for shilling his book. But that doesn't mean I won't accept it! (Are you listening, Charles?)
As we saw some time ago, process shutdown is a multi-phase affair. After you call ExitProcess, all the threads are forcibly terminated. After that's done, each DLL is sent a DLL_PROCESS_DETACH notification. You may be debugging a problem with DLL_PROCESS_DETACH handling that suggests that some of those threads were not cleaned up properly. For example, you might assert that a reference count is zero, and you find during process shutdown that this assertion sometimes fires. Maybe you terminated a thread before it got a chance to release its reference? How can you test this theory if the thread is already gone?
ExitProcess
DLL_PROCESS_DETACH
It so happens that when all the threads are terminated during the early phase of process shutdown, the kernel is a bit lazy and doesn't free their stacks. It figures, hey, the entire process is going away soon, so the stack memory is going to be cleaned up as part of process termination. (It's sort of the kernel equivalent of not bothering to sweep the floor of a building that's about to be demolished.) You can use this to your advantage by grovelling the stacks that were left behind.
Hey, this is why you get called in to debug the hard stuff, right?
Before continuing, I need to emphasize that this information is for debugging purposes only. The structures and offsets are all implementation details which can change from release to release.
The first step is to identify where all the stacks are. The direct approach is difficult because the stacks can be all different sizes, so it's not easy to pick them out of a line-up. But one thing does come in a consistent size: The TEB.
From the kernel debugger, use the !process command to dump the process you are interested in, and from the header information, extract the VadRoot.
!process
VadRoot
1: kd> !process -1 PROCESS 8731bd40 SessionId: 1 Cid: 0748 Peb: 7ffda000 ParentCid: 0620 DirBase: 4247b000 ObjectTable: 96f66de0 HandleCount: 104. Image: oopsie.exe VadRoot 893de570 Vads 124 Clone 0 Private 518. Modified 643. Locked 0. DeviceMap 995628c0
Dump this VAD root with the !vad command, and pay attention only to the entries which say 1 Private READWRITE.
!vad
1 Private READWRITE
1: kd> !vad 893de570 VAD level start end commit ... ignore everything except "1 Private READWRITE" ... 8730a5f0 ( 6) 50 50 1 Private READWRITE 9ab0cb40 ( 5) 60 7f 1 Private READWRITE 893978b0 ( 6) 80 9f 1 Private READWRITE 87302d30 ( 5) 110 110 1 Private READWRITE 889693f8 ( 6) 120 121 1 Private READWRITE 872f3fb8 ( 6) 170 170 1 Private READWRITE 87089a80 ( 6) 1a0 1a0 1 Private READWRITE 8cbf1cb0 ( 5) 1c0 1df 1 Private READWRITE 88c079d0 ( 6) 1e0 1e0 1 Private READWRITE 9abc33e0 ( 6) 410 48f 1 Private READWRITE 873173b0 ( 7) 970 970 1 Private READWRITE 8ca1c158 ( 7) 7ffd5 7ffd5 1 Private READWRITE 88c02a78 ( 6) 7ffd6 7ffd6 1 Private READWRITE 872f9298 ( 5) 7ffd7 7ffd7 1 Private READWRITE 8750d210 ( 7) 7ffd8 7ffd8 1 Private READWRITE 87075ce8 ( 6) 7ffda 7ffda 1 Private READWRITE 87215da0 ( 4) 7ffdc 7ffdc 1 Private READWRITE 872f2200 ( 6) 7ffdd 7ffdd 1 Private READWRITE 8730a670 ( 5) 7ffdf 7ffdf 1 Private READWRITE
(If you are debugging from user mode, then you can use !vadump but the output format is different.)
Each of these is a candidate TEB. In practice, TEBs tend to be allocated at the high end of memory, so the ones with a low start value are probably red herrings. Therefore, you should investigate these candidates in reverse order.
start
For each candidate, take the start address and append three zeroes. (Each page on x86 is 4KB, which conveniently maps to 1000 in hex.) Dump the first seven pointers of the TEB with the dp xxxxx000 L7 command.
dp xxxxx000 L7
1: kd> dp 7ffdf000 L7 7ffdf000 0016fbb0 00170000 0016b000 00000000 7ffdf010 00001e00 00000000 7ffdf000 ← hit
If the TEB is valid, then the seventh pointer points back to the start of the TEB. In a valid TEB, the second and third values are the stack limits; in this case, the candidate stack lives between 0016b000 and 00170000. (As a double-check, you can verify that the upper limit of the stack, 00170000 in this case, matches up with the end of a VAD allocation in the !vad output above.)
0016b000
00170000
Now that you know where the stack is, you can dps it and look for EBP frames. (I usually start about two to four pages below the upper limit of the stack.) Test out each candidate EBP frame with the k= command until you find one that seems to be solid. Record this candidate stack trace in a text file for further study.
dps
k=
Repeat for each candidate TEB, and you will eventually reconstruct what each thread in the process was doing at the moment it was terminated. If you're really lucky, you might even see the code that incremented the reference count but was terminated before it could release it.
The above discussion also applies to debugging 64-bit processes. However, instead of looking for 1 Private READWRITE pages, you want to look for 2 Private READWRITE pages. As an additional wrinkle, if you are debugging ia64, then converting a page frame to a linear address is sadly not as simple as appending three zeroes. Pages on ia64 are 8KB, not 4KB, so you need to shift the value left by 25 bits: Add three zeroes and then multiply by two.
2 Private READWRITE
And finally, if you are debugging a 32-bit process on x64, then you want to look for 3 Private READWRITE pages, but add 2 before appending the three zeroes. That's because the TEB for a 32-bit process on x64 is really two TEBs glued together: A 64-bit TEB followed by a 32-bit TEB.
3 Private READWRITE
Note: I did not come up with this debugging technique on my own. I learned it from an even greater debugging genius.
Next time, we'll look at debugging this issue from a user-mode debugger.
Trivia: The informal term for these terminated-but-not-yet-completely-destroyed threads is ghost threads. The term was coined by the Exchange support team, because they often have to study server failures that require them to do this type of investigation, and they needed a cute name for it.
A customer reported a problem that occurred only when they installed a particular application. If they uninstalled it, then the problem went away. After installing the application, the "Run As" context menu option stopped working. The customer didn't provide any other details, but we were able to make an educated guess as to what was going on.
A common programming error in context menu extensions occurs in extensions which add only one menu item. These extensions ignore the parameters to the IContextMenu::InvokeCommand and simply assume that the only reason the method can be called is if the user selected their menu item. After all, if you have only one invokable item, there's no need to figure out which one the user selected, because you have only one to begin with!
IContextMenu::InvokeCommand
The problem is that a context menu extension can be invoked not because the user selected an item under its control but because a verb is being invoked programmatically, and each handler is being asked, "Do you know how to do this?"
The result is that the context menu host calls the extension to say, "If you know how to do runas, then please do so," and the the extension says "Sure, we do that" and starts doing its thing. If you are unlucky and the grabby extension is asked the question before the actual runas extension, the runas command winds up being hijacked by the grabby extension.
runas
(This is the same mistake that causes the Copy To and Move To commands to behave strangely if you add them to the context menu: They assume that the only reason they are invoked is that the user invoked their command, because they weren't designed to be hosted by context menus to begin with! They were designed to go into the toolbar, and the toolbar hosting code never invoked commands by name. It's like taking a ladder and using it as a bridge between two tall buildings. Sure, you can now cross from one building to another, but you also run a serious risk of falling to your death.)
A variation on the initial problem is "I found that after installing a particular program, I can't run anything from the Start menu." I know of at least two programs which install context menu extensions which steal the "open" command on executables.
This problem is sufficiently prevalent that there is a special compatibility flag that can be set on a shell extension to say, "This is a grabby shell extension that steals commands. Never ask it if it supports anything, because it will always say yes!"
Notice that the "MoveTo CopyTo Context Menu" is on the list, which I find interesting because MoveTo/CopyTo was never meant to go on the context menu in the first place. Going back to our analogy, it'd be as if the ladder company issued a safety bulletin to warn people of problems that can occur if you use it as a bridge between two tall buildings!
Mike Dunn wonders what the Microspeak term parking lot means.
I'm not familiar with this term either, and the first document I turned up during my search was a PowerPoint presentation that said "Avoid using Microsoft jargon terms, such as parking lot and dogfood."
Yeah, that wasn't much help.
From what I can gather, the term parking lot started out as a term used during brainstorming sessions. You've got a bunch of people in a conference room tossing out all sorts of ideas. The traditional way of organizing the ideas is to write each one on a Post-It® note and stick it on the whiteboard. As more and more notes appear, you start to organize them by grouping together similar ideas.
Every so often, you'll run into an idea that, while good, isn't really relevant to the problem you're trying to solve. You don't want to throw it away, so instead, you designate a corner of the whiteboard to be the place to "park" those ideas for later consideration. That corner of the whiteboard is nicknamed the parking lot.
The term parking lot then began to be applied to the document that collected all of these "parked" ideas, so they could be circulated to a more appropriate audience.
The term then expanded to refer to any document which served as the official repository of assorted suggestions for future work or discussion. (Known to some people simply as The List.) For example, there is a SharePoint List titled Active Issues and the subtitle parking lot for discussion topics in weekly XYZ meeting. Each item on the list is assigned to a particular person and assigned a priority.
I can't find any citations for parking lot being used as a way to say something like "we'll talk about this after the meeting is over," but I can see how it could be related to the sense of parking lot I was able to turn up: The parking lot is the list of things that aren't really relevant to the topic at hand but which are still worth discussing. We just won't discuss them here.
Commenter rs asks, "Why does Windows (historically) return 2 for MulDiv(1, -0x80000000, -0x80000000) while Wine returns zero?"
MulDiv(1, -0x80000000, -0x80000000)
The MulDiv function multiplies the first two parameters and divides by the third. Therefore, the mathematically correct answer for MulDiv(1, -0x80000000, -0x80000000) is 1, because a × b ÷ b = a for all nonzero b.
MulDiv
So both Windows and Wine get it wrong. I don't know why Wine gets it wrong, but I dug through the archives to figure out what happened to Windows.
First, some background. What's the point of the MulDiv function anyway?
Back in the days of 16-bit Windows, floating point was very expensive. Most people did not have math coprocessors, so floating point was performed via software emulation. And the software emulation was slow. First, you issued a floating point operation on the assumption that you had a float point coprocessor. If you didn't, then a coprocessor not available exception was raised. This exception handler had a lot of work to do.
It decoded the instruction that caused the exception and then emulated the operation. For example, if the bytes at the point of the exception were d9 45 08, the exception handler would have to figure out that the instruction was fld dword ptr ds:[di][8]. It then had to simulate the operation of that instruction. In this case, it would retrieve the caller's di register, add 8 to that value, load four bytes from that address (relative to the caller's ds register), expand them from 32-bit floating point to 80-bit floating point, and push them onto a pretend floating point stack. Then it advanced the instruction pointer three bytes and resumed execution.
d9 45 08
fld dword ptr ds:[di][8]
di
ds
This took an instruction that with a coprocessor would take around 40 cycles (already slow) and ballooned its total execution time to a few hundred, probably thousand cycles. (I didn't bother counting. Those who are offended by this horrific laziness on my part can apply for a refund.)
It was in this sort of floating point-hostile environment that Windows was originally developed. As a result, Windows has historically avoided using floating point and preferred to use integers. And one of the things you often have to do with integers is scale them by some ratio. For example, a horizontal dialog unit is ¼ of the average character width, and a vertical dialog unit is 1/8 of the average character height. If you have a value of, say, 15 horizontal dlu, the corresponding number of pixels is 15 × average character width ÷ 4. This multiply-then-divide operation is quite common, and that's the model that the MulDiv function is designed to help out with.
In particular, MulDiv took care of three things that a simple a × b ÷ c didn't. (And remember, we're in 16-bit Windows, so a, b and c are all 16-bit signed values.)
The MulDiv function was written in assembly language, as was most of GDI at the time. Oh right, the MulDiv function was exported by GDI in 16-bit Windows. Why? Probably because they were the people who needed the function first, so they ended up writing it.
Anyway, after I studied the assembly language for the function, I found the bug. A shr instruction was accidentally coded as sar. The problem manifests itself only for the denominator −0x8000, because that's the only one whose absolute value has the high bit set.
shr
sar
−0x8000
The purpose of the sar instruction was to divide the denominator by two, so it can get the appropriate rounding behavior when there is a remainder. Reverse-compiling back into C, the function goes like this:
int16 MulDiv(int16 a, int16 b, int16 c) { int16 sign = a ^ b ^ c; // sign of result // make everything positive; we will apply sign at the end if (a < 0) a = -a; if (b < 0) b = -b; if (c < 0) c = -c; // add half the denominator to get rounding behavior uint32 prod = UInt16x16To32(a, b) + c / 2; if (HIWORD(prod) >= c) goto overflow; int16 result = UInt32Div16To16(prod, c); if (result < 0) goto overflow; if (sign < 0) result = -result; return result; overflow: return sign < 0 ? INT_MIN : INT_MAX; }
Given that I've already told you where the bug is, it should be pretty easy to spot in the code above.
Anyway, when this assembly language function was ported to Win32, it was ported as, well, an assembly language function. And the port was so successful, it even preserved (probably by accident) the sign extension bug.
Mind you, it's a bug with amazing seniority.
This upcoming Sunday is Mother's Day in the United States. In recognition of the holiday last year, a local church displayed the following message on its message board: "God couldn't be / everywhere / so God made mothers / German speaking."
This explains why your mother speaks German.
POIDH
The church in question has an evening German-language service, and the advertisement for that service juxtaposed against the Jewish proverb produced an unexpected result.
A computer running some tests encountered a mysterious crash:
eax=ffffffff ebx=00000000 ecx=038ef548 edx=17b060b4 esi=00000000 edi=038ef6f0 eip=14ae1b77 esp=038ef56c ebp=038ef574 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 FOO!CFrameWnd::GetAssociatedWidget+0x47: 14ae1b77 8bd8 mov ebx,eax
A colleague of mine quickly diagnosed the proximate cause.
*Something* marked the code page PAGE_READWRITE, instead of PAGE_EXECUTE_READ. I suspect a bug in a driver. FOO is just a victim here. 0:002> !vprot 14ae1b77 BaseAddress: 14ae1000 AllocationBase: 14ae0000 AllocationProtect: 00000080 PAGE_EXECUTE_WRITECOPY RegionSize: 00001000 State: 00001000 MEM_COMMIT Protect: 00000004 PAGE_READWRITE Type: 01000000 MEM_IMAGE
*Something* marked the code page PAGE_READWRITE, instead of PAGE_EXECUTE_READ. I suspect a bug in a driver. FOO is just a victim here.
0:002> !vprot 14ae1b77 BaseAddress: 14ae1000 AllocationBase: 14ae0000 AllocationProtect: 00000080 PAGE_EXECUTE_WRITECOPY RegionSize: 00001000 State: 00001000 MEM_COMMIT Protect: 00000004 PAGE_READWRITE Type: 01000000 MEM_IMAGE
This diagnosis was met with astonishment. "Wow! What made you think to check the protection on the code page?"
Well, let's see. We're crashing on a mov ebx, eax instruction. This does not access memory; it's a register-to-register operation. There's no way a properly functioning CPU can raise an exception on this instruction.
mov ebx, eax
At this point, what possibilities remain?
(Note that the second and third options involve rejecting the assumption that the CPU is behaving properly.)
These are in increasing order of paranoia, so you naturally start with the least paranoid possibility.
Then, of course, there's the non-psychic solution: Ask the debugger for the exception record.
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 14ae1b77 (FOO!CFrameWnd::GetAssociatedWidget+0x00000047) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 00000008 Parameter[1]: 14ae1b77 Attempt to execute non-executable address 14ae1b77
That last line pretty much hands it to you on a silver platter.
One source of cheap amusement is searching for spelling errors in the registry. For example, one program tried to register a new file extension, or at least they tried, except that they spelled Extension wrong.
Extension
And they wonder why that feature never worked.
My discovery was that my registry contained the mysterious key HKEY_CURRENT_USER\S. After some debugging, I finally found the culprit. There was a program on my computer that did the equivalent of this:
HKEY_CURRENT_USER\S
RegCreateKeyA(HKEY_CURRENT_USER, (PCSTR)L"Software\\...", &hk);
One of my colleagues remarked, "With enough force, any peg will fit in any hole."
I suspect that the code was not that aggressively wrong. It was probably something more subtle.