June, 2012

  • The Old New Thing

    2012 mid-year link clearance


    Another round of the semi-annual link clearance.

    And, as always, the obligatory plug for my column in TechNet Magazine:

  • The Old New Thing

    How did real-mode Windows patch up return addresses to discarded code segments?


    Last week, I described how real-mode Windows fixed up jumps to functions that got discarded. But what about return addresses to functions that got discarded?

    The naïve solution would be to allocate a special "return address recovery" function for each return address you found, but that idea comes with its own problems: You are patching addresses on the stack because you are trying to free up memory. It would be a bad idea to try to allocate memory while you're trying to free memory! What if in order to satisfy the allocation, you had to discard still more memory? You would start moving and patching stacks before they were fully patched from the previous round of memory motion. The stack patcher would get re-entered and see an inconsistent stack because the previous stack patcher didn't get a chance to finish. The result would be rampant memory corruption.

    Therefore, you have to preallocate your "return address recovery" functions. But you don't know how many return addresses you're going to need until you walk the stack (at which point it's too late), and you definitely don't want to preallocate the worst-case scenario since each stack can be up to 64KB in size, and can hold up to 16384 return addresses. You'd end up allocating nearly all your available memory just for return address recovery stubs!

    How did real-mode Windows solve this problem?

    It magically found a way to put ten pounds of flour in a five-pound bag.

    For each segment, there was a special "return thunk" that was shared by all return addresses which returned back into that segment. Since there is only one per segment, you can preallocate it as part of the segment bookkeeping data. To patch the return address, the original return address was overwritten by this shared return thunk. Okay, so you have 32 bits of information you need to save (the original return address, which consists of 16 bits for the segment and 16 bits for the offset), and you have a return thunk that captures the segment information. But you still have 16 bits left over. Where do you put the offset?

    We saw some time ago that the BP register associated with far calls was incremented before being pushed on the stack. This was necessary so that the stack patcher knew whether to decode the frame as a near frame or a far frame. But that wasn't the only rule associated with far stack frames. On entry to a far function,

    • The first thing you do is increment the BP register and push it onto the stack.
    • The second thing you do is push the DS register. (DS is the data segment register, which holds a segment containing data the caller function wanted to be able to access.)

    Every far call therefore looks like this on the stack:

    xxxx+6IP of return address
    xxxx+4CS of return address
    xxxx+2pointer to next stack frame (bottom bit set)
    xxxx+0DS of caller

    The stack patcher overwrote the saved CS:IP with the address of the return thunk. The return thunk describes the segment that got discarded, so the CS is implied by your choice of return thunk, but the stack patcher still needed to save the IP somewhere. So it saved it where the DS used to be.

    Wait, you've just traded one problem for another. Sure, you found a place to put the IP, but now you have to find a place to put the DS.

    Here comes the magic.

    Recall that on the 8086, the combination segment:offset corresponds to the physical address segment×16 + offset. For example, the address 1234:0005 refers to physical byte 0x1234 * 16 + 0x0005 = 0x12345.

    Since the segment and offset were both 16-bit values, there were multiple ways to refer to the same physical address. For example, 1000:2345 would also resolve to physical address 0x12345. But there are other ways to refer to the same byte, like the not entirely obvious 0FFF:2355. In fact, there's a whole range of values you can use, starting from 0235:FFF5 and ending at 1234:0005. In general, there are 4096 different ways of referring to the same address.

    There's a bit of a problem with very low addresses, though. To access byte 0x00400, for example, you could use 0000:0400 through 0040:0000, but that's as far as you could go, so these very low addresses do not have the full complement of aliases.

    Aha, but they do if you take advantage of wraparound. Since the 8086 had only 20 address lines, any overflow in the calculations was simply taken mod 0x100000. Therefore, you could also use F041:FFF0 to refer to the address, because 0xF041 × 16 + 0xFFF0 = 0x100400 ≡ 0x00400. Wraparound allowed the full range of 4096 aliases since you could use F041:FFF0 to FFFF:0410, and then 0000:0400 to 0040:0000.

    Related reading: The story of the mysterious WINA20.386 file.

    Okay, back to stack patching.

    Once you consider aliasing, you realize that the 32 bits of flour actually had a lot of air in it. By rewriting the address of the return thunk into the form XXXX:000Y, you can see the 12 bits of air, and to stash the 12-bit value N into that air pocket, you would set the segment to XXXX−N and the offset to N×16+Y.

    Recall that we were looking for a place to put that saved DS value, which is a 16-bit value, and we have found 12 bits of air in which to save it. We need to find 4 more bits of air somewhere.

    The next trick is realizing that DS is not an arbitrary 16-bit value. It's a 16-bit segment value that was obtained from the Windows memory manager. Therefore, if the Windows memory manager imposed a generous artificial limit of 4096 segments, it could convert the DS segment value into an integer segment index.

    That index got saved in the upper 12 bits of the offset.

    Okay, let's see what happens when the code tries to unwind to the patched return address.

    The function whose return address got patched goes through the usual function epilogue. It pops what it thinks is the original DS off the stack, even though that DS has been secretly replaced by the original return address's IP. The epilogue then pops the old BP, decrements it, and returns to the return thunk. The return thunk now knows where the real return address is (it knows which segment it is responsible for, and it can figure out the IP from the incoming DS register). It can also study its own IP to extract the segment index and from that recover the original DS value. Now that it knows what the original code was trying to do, it can reload the segment, restore the registers to their proper values, and jump to the original return address inside the newly-loaded segment.

    I continue to be amazed at how real-mode Windows managed to get so much done with so little.

    Exercise: The arbitrary limit of 4096 segments was quite generous, seeing as the maximum number of selectors in protected mode was defined by the processor to be 8191. What small change could you make to expand the segment limit in real mode to match that of protected mode?

  • The Old New Thing

    You still need the "safe" functions even if you check string lengths ahead of time


    Commenter POKE53280,0 claims, "If one validates parameters before using string functions (which quality programmers should do), the 'safe' functions have no reason to exist."

    Consider the following function:

    int SomeFunction(const char *s)
      char buffer[256];
      if (strlen(s) ≥ 256) return ERR;
      strcpy(buffer, s);

    What could possibly go wrong? You check the length of the string, and if it doesn't fit in the buffer, then you reject it. Therefore, you claim, the strcpy is safe.

    What could possibly go wrong is that the length of the string can change between the time you check it and the time you use it.

    char attack[512] = "special string designed to trigger a "
                       "buffer overflow and attack your machine. [...]";
    void Thread1()
     char c = attack[256];
     while (true) {
      attack[256] ^= c;
    void Thread2()
     while (true) {

    The first thread changes the length of the string rapidly between 255 and 511, between a string that passes validation and a string that doesn't, and more specifically between a string that passes validation and a string that, if it snuck through validation, would pwn the machine.

    The second thread keeps handing this string to Some­Function. Eventually, the following race condition will be hit:

    • Thread 1 changes the length to 255.
    • Thread 2 checks the length and when it reaches attack[256], it reads zero and concludes that the string length is less than 256.
    • Thread 1 changes the length to 511.
    • Thread 2 copies the string and when it reaches attack[256], it reads nonzero and keeps copying, thereby overflowing its buffer.

    Oops, you just fell victim to the Time-of-check-to-time-of-use attack (commonly abbreviated TOCTTOU).

    Now, the code above as-written is not technically a vulnerability because you haven't crossed a security boundary. The attack code and the vulnerable code are running under the same security context. To make this a true vulnerability, you need to have the attack code running in a lower security context from the vulnerable code. For example, if the threads were running user-mode code and Some­Function is a kernel-mode function, then you have a real vulnerability. Of course, if Some­Function were at the boundary between user-mode and kernel-mode, then it has other things it needs to do, like verify that the memory is in fact readable by the process.

    A more interesting case where you cross a security boundary is if the two threads are running code driven from an untrusted source; for example, they might be threads in a script interpreter, and the toggling of attack[256] is being done by a function on a Web page.

    // this code is in some imaginary scripting language
    var attack = new string("...");
    procedure Thread1()
     var c = attack[256];
     while (true) attack[256] ^= c;
    handler OnClick()
     new backgroundTask(Thread1);
     while (true) foo(attack);

    When the user clicks on the button, the script interpret creates a background thread and starts toggling the length of the string under the instructions of the script. Meanwhile, the main thread calls foo repeatedly. And suppose the interpreter's implementation of foo goes like this:

    void interpret_foo(const function_args& args)
     if (args.GetLength() != 1) wna("foo");
     if (args.GetArgType(0) != V_STRING) wta("foo", 0, V_STRING);
     char *s = args.PinArgString(0);

    The script interpreter has kindly converted the script code into the equivalent native code, and now you have a problem. Assuming the user doesn't get impatient and click "Stop script", the script will eventually hit the race condition and cause a buffer overflow in Some­Function.

    And then you get to scramble a security hotfix.

  • The Old New Thing

    How did my hard drive turn into a TARDIS?


    A customer observed that the entry for a network drive looked liked this in My Computer, well, except that there was a network drive icon instead of ASCII art.

    O Public (\\server) (S:)
    3.81TB free of 2.5GB

    How is it possible for a 2.5GB drive to have 3.81TB free?

    While there have certainly been examples of Explorer showing confusing values the reason for the strange results was, at least this time, not Explorer's fault.

    This particular network drive actually reported (via Get­Disk­Free­Space­Ex) that it had more free space than drive space. Explorer is dutifully reporting the information it was given, because it doesn't try to second-guess the file system. If a network drive wants to report that it is a TARDIS, then it's a TARDIS.

  • The Old New Thing

    Thanks for reminding me what to do when the elevator is out of order


    Every few years, the building maintenance people have to perform tests on the elevators to ensure they meet safety regulations. And the real estate department sends out the usual notice informing the building occupants that the elevators in the building will be taken out of service at various times during the day. They were kind enough to include the following advice:

    If an elevator is non-responsive and/or has out of order signage posted, please use another available elevator.

    One of my colleagues sarcastically remarked, "Wow, thank goodness they sent that email. I'd have no idea what to do had I seen a non-responsive elevator with an 'out of order' sign posted on it."

  • The Old New Thing

    How does Explorer determine the delay between clicking on an item and initiating an edit?


    Ian Boyd wants to know why the specific value of 500ms was chosen as the edit delay in Windows Explorer.

    Because it's your double-click time.

    Since the double-click action (execute) is not an extension of the single-click action (edit), Explorer (and more generally, list view) waits for the double-click timeout before entering edit mode so it can tell whether that first click was really a single-click on its own or a single-click on the way to a double-click.

    If the timeout were shorter than the double-click time, then double-clicking an item would end up having the first click trigger edit mode and the second click selecting text inside the editor.

    (If the timeout were longer, then everything would still work, but you would just have to wait longer.)

    Ian says, "Through my testing it does not appear linked to configurable double-click timeout." My guess is that Ian changed the double-click timeout not by calling Set­Double­Click­Time but by whacking a registry value directly. The values in the registry are loaded and cached at logon; you can update them all you want at runtime, nobody will care.

  • The Old New Thing

    How did real-mode Windows fix up jumps to functions that got discarded?


    In a discussion of how real-mode Windows walked stacks, commenter Matt wonders about fixing jumps in the rest of the code to the discarded functions.

    I noted in the original article that "there are multiple parts to the solution" and that stack-walking was just one piece. Today, we'll look at another piece: Inter-segment fixups.

    Recall that real-mode Windows ran on an 8086 processor, a simple processor with no memory manager, no CPU privilege levels, and no concept of task switching. Memory management in real-mode Windows was handled manually by the real-mode kernel, and the way it managed memory was by loading code from disk on demand, and discarding code when under memory pressure. (It didn't discard data because it wouldn't know how to regenerate it, and it can't swap it out because there was no swap file.)

    There were a few flags you could attach to a segment. Of interest for today's discussion are movable (and it was spelled without the "e") and discardable. If a segment was not movable (known as fixed), then it was loaded into memory and stayed there until the module was unloaded. If a segment was movable, then the memory manager was allowed to move it around when it needed to defragment memory in order to satisfy a large memory allocation. And if a segment was discardable, then it could even be evicted from memory to make room for other stuff.

    NoNoCannot be moved or discarded
    NoYes(invalid combination)
    YesNoCan be moved in memory
    YesYesCan be moved or purged from memory

    I'm going to combine the movable and discardable cases, since the effect is the same for the purpose of today's discussion, the difference being that with discardable memory, you also have the option of throwing the memory out entirely.

    First of all, let's get the easy part out of the way. If you had an intra-segment call (calling a function in your own segment), then there was no work that needed to be done. Real-mode Windows always discarded full segments, so if your segment was running code, it was by definition present, and therefore any other code in that segment was also present. The hard part is the inter-segment calls.

    As it happens, an old document on the 16-bit Windows executable file format gives you some insight into how things worked, if you sit down and puzzle it out hard enough.

    Let's start with the GetProcAddress function. When you call GetProcAddress, the kernel needs to locate the address of the function inside the target module. The loader consults the Entry Table to find the function you're asking for. As you can see, there are three types of entries in the Entry Table. Unused entries (representing ordinals with no associated function), fixed segment entries, and movable segment entries. Obviously, if the match is in an unused entry, the return value is NULL because there is no such function. If the match is in a fixed entry, that's pretty easy too: Look up the segment number in the target module's segment list and combine it with the specified offset. Since the segment is fixed, you can just return the raw pointer directly, since the code will never move.

    The tricky part is if the function is in a movable segment. If you look at the document, it says that "a moveable segment entry is 6 bytes long and has the following format." It starts with a byte of flags (not important here), a two-byte INT 3Fh instruction, a one-byte segment number, and the offset within the segment.

    What's the deal with the INT 3Fh instruction? It seems rather pointless to specify that a file format requires some INT 3Fh instructions scattered here and there. Why not get rid of it to save some space in the file?

    If you called GetProcAddress and the result was a function in a movable segment, the GetProcAddress didn't actually return the address of the target function. It returned the address of the INT 3Fh instruction! (Thankfully, the Entry Table is always a fixed segment, so we don't have to worry about the Entry Table itself being discarded.)

    (Now you see why the file format includes these strange INT 3Fh instructions: The file format was designed to be loaded directly into memory. When the loader loads the entry table, it just slurps it into memory and bingo, it's ready to go, INT 3Fh instructions and all!)

    Since GetProcAddress returned the address of the INT 3Fh instruction, calls to imported functions didn't actually go straight to the target. Instead, you called the INT 3Fh instruction, and it was the INT 3Fh handler which said, "Gosh, somebody is trying to call code in another segment. Is that segment loaded?" It took the return address of the interrupt and used it to locate the segment number and offset. If the segment in question was already in memory, then the handler jumped straight to the segment at the specified offset. You got the function call you wanted, just in a roundabout way.

    If the segment wasn't loaded, then the INT 3Fh handler loaded it (which might trigger a round of discarding and consequent stack patching), then jumped to the newly-loaded segment at the specified offset. An even more roundabout function call.

    Okay, so that's the case where a function pointer is obtained by calling GetProcAddress. But it turns out that a lot of stuff inside the kernel turns into GetProcAddress at the end of the day.

    Suppose you have some code that calls a function in another segment within the same module. As we saw earlier, fixups are threaded through the code segment, and if you scroll down to the Per Segment Data section of that old document, you'll see a description of the way the relocation records are expressed. A call to a function to a segment within the same module requires an INTERNALREF fixup, and as you can see in the document, there are two types of INTERNALREF fixups, ones which refer to fixed segments and ones which refer to movable segments.

    The easy case is a reference to a fixed segment. In that case, the kernel can just look up where it put that segment, add in the offset, and patch that address into the code segment. Since it's a fixed segment, the patch will never have to be revisited.

    The hard case is a reference to a movable segment. In that case, you can see that the associated information in the fixup table is the "ordinal number index into [the] Entry Table."

    Aha, you now realize that the Entry Table is more than just a list of your exported functions. It's also a list of all the functions in movable segments that are called from other segments. In a sense, these are "secret exports" in your module. (However, you can't get to them by GetProcAddress because GetProcAddress knows how to keep a secret.)

    To fix up a reference to a function in a movable segment, the kernel calls the SecretGetProcAddress (not its real name) function, which as we saw before, returns not the actual function pointer but rather a pointer to the magic INT 3Fh in the Entry Table. It is that pointer which is patched into your code segment, so that when your code calls what it thinks is a function in another segment, it's really calling the Entry Table, which as we saw before, loads the code in the target segment if necessary before jumping to it.

    Matt wrote, "If the kernel wants to discard that procedure, it has to find that jump address in my code, and redirect it to a page fault handler, so that when my process gets to it, it will call the procedure and fault the code back in. How does it find all of the references to that function across the program, so that it can patch them all up?" Now you know the answer: It finds all of those references because it already had to find them when applying fixups. It doesn't try to find them at discard time; it finds them when it loads your segment. (Exercise: Why doesn't it need to reapply fixups when a segment moves?)

    All inter-segment function pointers were really pointers into the Entry Table. You passed a function pointer to be used as a callback? Not really; you really passed a pointer to your own Entry Table. You have an array of function pointers? Not really; you really have an array of pointers into your Entry Table. It wasn't actually hard for the kernel to find all of these because you had to declare them in your fixup table in the first place.

    It is my understanding that the INT 3Fh trick came from the overlay manager which was included with the Microsoft C compiler. (The Zortech C compiler followed a similar model.)

    Note: While the above discussion describes how things worked in principle, there are in fact a few places where the actual implementation differs from the description above, although not in any way that fundamentally affects the design.

    For example, real-mode Windows did a bit of optimization in the INT 3Fh stubs. If the target segment was in memory, then it replaced the INT 3Fh instruction with a direct jmp xxxx:yyyy to the target, effectively precalculating the jump destination when a segment is loaded rather than performing the calculation each time a function in that segment is called.

    By an amazing coincidence, the code sequence

        int 3fh
        db  entry_segment
        dw  entry_offset

    is five bytes long, which is the exact length of a jmp xxxx:yyyy instruction. Phew, the patch just barely fits!

  • The Old New Thing

    When the default pushbutton is invoked, the invoke goes to the top-level dialog


    One quirk of nested dialogs lies in what happens when the user presses Enter to invoke the default pushbutton: The resulting WM_COMMAND message goes to the top-level dialog, even if the default pushbutton belongs to a sub-dialog.

    Why doesn't it send the WM_COMMAND to the parent of the default pushbutton? I mean, the dialog manager knows the handle of the button, so it can send the message to the button's parent, right?

    Well, the dialog manager knows the handle of a button. But not necessarily the button. Recall that if focus is not on a pushbutton, then the dialog manager sets the default pushbutton based on the control ID returned by the DM_GET­DEF­ID message, and it does this by just searching the dialog for a control with that ID. If you have two controls with the same ID, it picks one of them arbitrarily. So far so bad.

    It's like having two John Smiths living in your house, one in the second bedroom and one living in the guest room. The post office is very strict and won't let you write "John Smith, Second Bedroom, 1 Main Street" and "John Smith, Guest Room, 1 Main Street." All you're allowed to write is a name and an address. Therefore, all the mail addressed to "John Smith, 1 Main Street" ends up in a single mailbox labeled "1 Main Street" and now you have to figure out who gets each piece of mail.

    Okay, so we saw that when converting an ID to a window, and there are multiple windows with the same ID, the dialog manager will just pick one arbitrarily. And if it picks the wrong one, it would have sent the WM_COMMAND to the wrong dialog procedure entirely! At least by sending it to the top-level dialog, it says, "Dude, I think it's this window but I'm not sure, so if you have some really clever way of telling which is which, you can try to sort it out." And now that the WM_COMMAND sometimes goes to the top-level dialog, you're pretty much stuck having it always go to the top-level dialog for consistency. It's better to be consistently wrong in a predictable manner (so people can work around it reliably) than to be mostly-right and occasionally-completely-wrong.

    Third rationale: Because you're asking for code to be written to handle a case that people shouldn't have gotten into in the first place. (Namely, duplicate control IDs.)

    Whatever the reason, it's something you need to be on the lookout for. If you did everything right and avoided control ID duplication, then the workaround in your WM_COMMAND handler is straightforward:

    void OnCommand(HWND hwnd, int id, HWND hwndCtl, UINT codeNotify)
        if (hwndCtl != nullptr)
            HWND hwndCtlParent = GetParent(hwndCtl);
            if (hwndCtlParent != nullptr &&
                hwndCtlParent != hwnd &&
                IsChild(hwnd, hwndCtlParent))
               FORWARD_WM_COMMAND(hwndCtlParent, id,
                                  hwndCtl, codeNotify, SendMessage);
        ... the message was for me after all, so let's handle it...
        switch (id)

    When we get the WM_COMMAND message, we first check that it really came from one of our direct children. If not, then we forward the message on to the control's actual parent. (The window that should have gotten the message in the first place in an ideal world.)

    Exercise: Under what circumstances can the above workaround fail? (Not counting the scenario we've spent the past three days discussing.)

    Anyway, back to the question from last time: How does the property sheet manager deal with multiple property sheets pages having conflicting control IDs? In addition to what we previously discussed, another mitigating factor is that the property sheet manager keeps only one child dialog visible at a time. This takes the hidden child dialogs out of the running for most dialog-related activities, such as dialog navigation, since invisible controls cannot be targets of dialog navigation. Furthermore, hidden child dialogs are skipped when searching for keyboard accelerators, thereby avoiding the problem of hidden accelerators. So as long as the property sheet manager makes sure that focus doesn't stay on a hidden control after a page change, there shouldn't be any notifications coming from a hidden child dialog. The only conflicts it needs to worry about are conflicts between the page and the frame.

  • The Old New Thing

    Counting down to the last day of school, as students do it


    Today is the last day of school in Seattle public school. My friend the seventh-grade teacher told me that students count down to the last day of school in a rather unusual way.

    Some people might count the number of calendar days until the end of school. For example, if there are 35 days between today and the last day of school, we say that it's 35 days until the end of school.

    Others might count only the number of school days before school is out. If today is Monday, and the last day of school is Friday, then there are five days of school remaining.

    But students, or at least seventh-grade students, count the days differently.

    First of all, you don't count today. Because today has already started, so you may as well treat it as already over.

    Next, you don't count the last day of school itself, because nobody gets anything done on the last day anyway, it's basically just one big party.

    Also, if there are any half-days or days with early dismissal, don't count those either, because days where you aren't there the whole day don't count, because the class periods are so short you don't get anything done.

    Similarly, days with special events like a field trip don't count.

    Furthermore, you don't count Mondays, because on Monday, you're still fresh off the weekend and you won't be concentrating on school anyway.

    Conversely, you don't count Fridays, because you've already mentally checked out.

    The result of all these calculations is that the students will cheerfully calculate that there are only 25 days of school left, when it's still only late April.

  • The Old New Thing

    When embedding a dialog inside another, make sure you don't accidentally create duplicate control IDs


    The WS_EX_CONTROL­PARENT extended style (known in dialog templates as DS_CONTROL) instructs the dialog manager that the dialog's children should be promoted into the dialog's parent. This is easier to explain in pictures than in text. Given the following window hierarchy:

    A   B
      C   D
      B1   B2  

    The result of the WS_EX_CONTROL­PARENT extended style being set is that the children of B are treated as if they were direct children of the main dialog, and the window B itself logically vanishes.

    A   B1   B2   C   D

    The WS_EX_CONTROL­PARENT extended style means "Hello, I am not really a dialog control. I am a control parent. In other words, I have children, and those children are controls." (Some people erroneously put the WS_EX_CONTROL­PARENT extended style on the main dialog itself. That's wrong, but fortunately it also has no effect.)

    Okay, this is all stuff you already know. So why am I bringing up this topic? I sort of gave it away in the subject line: When you use WS_EX_CONTROL­PARENT, you need to be careful that the controls that you are promoting into the parent don't conflict with controls already in the parent, or with controls promoted into the parent by another sibling.

    Suppose, in the above scenario, that window C also had the WS_EX_CONTROL­PARENT extended style, and it had children C1 and C2. Not only do you have to watch out that B1 and B2 don't conflict with the controls A or D, you also have to watch out that it doesn't conflict with C1 or C2 either.

    "But Mister Wizard, the property sheet control hosts multiple child dialogs, and since they can be provided by third parties, it's entirely possible (and likely) that there will be conflicts between B1 and, say, C2. Why doesn't this create a problem?"

    Well, Timmy, most of the time, it doesn't, because notifications are fired to the control's parent, and in the case of child dialogs, the child dialog's child controls fire their notifications to the child dialog. So as long as the identifiers are unique within the child dialog, you won't have a problem. This isn't the entire answer, however, but to understand it further, we'll need to look at another consequence of control ID conflicts, which we'll take up next time.

Page 1 of 3 (25 items) 123