History

  • The Old New Thing

    How did the X-Mouse setting come to be?

    • 34 Comments

    Commenter HiTechHiTouch wants to know whether the "X-Mouse" feature went through the "every request starts at −100 points filter", and if so, how did it manage to gain 99 points?

    The X-Mouse feature is ancient and long predates the "−100 points" rule. It was added back in the days when a developer could add a random rogue feature because he liked it.

    But I'm getting ahead of myself.

    Rewind back to 1995. Windows 95 had just shipped, and some of the graphics people had shifted their focus to DirectX. The DirectX team maintained a very close relationship with the video game software community, and a programmer at one of the video game software companies mentioned in passing as part of some other conversation, "Y'know, one thing I miss from my X-Windows workstation is the ability to set focus to a window by just moving the mouse into it."

    As it happened, that programmer happened to mention it to a DirectX team member who used to be on the shell team, so the person he mentioned it to actually knew a lot about all this GUI programming stuff. Don't forget, in the early days of DirectX, it was a struggle convincing game vendors to target this new Windows 95 operating system; they were all accustomed to writing their games to run under MS-DOS. Video game programmers didn't know much about programming for Windows because they had never done it before.

    That DirectX team member sat down and quickly pounded out the first version of what eventually became known to the outside world as the X-Mouse PowerToy. He gave a copy to that programmer whose request was made almost as an afterthought, and he was thrilled that he could move focus around with the mouse the way he was used to.

    "Hey, great little tool you got there. Could you tweak it so that when I move the mouse into a window, it gets focus but doesn't come to the top? Sorry I didn't mention that originally; I didn't realize you were going to interpret my idle musing as a call to action!"

    The DirectX team member added the feature and added a check-box to the X-Mouse PowerToy to control whether the window is brought to the top when it is activated by mouse motion.

    "This is really sweet. I hate to overstay my welcome, but could you tweak it so that it doesn't change focus until my mouse stays in the window for a while? Again, sorry I didn't mention that originally."

    Version three of X-Mouse added the ability to set a delay before it moved the focus. And that was the version of X-Mouse that went into the PowerToys.

    When the Windows NT folks saw the X-Mouse PowerToy, they said, "Aw shucks, we can do that too!" And they added the three System­Parameters­Info values I described in an earlier article so as to bring Windows NT up to feature parity with X-Mouse.

    It was a total rogue feature.

  • The Old New Thing

    Why am I in the Quake credits?

    • 13 Comments

    Anon wants to know why I am listed in the credits for the video game Quake under the "Special Thanks" section. "Were you an early tester/debugger?"

    I've never played a game of Quake in my entire life. I (and most of the rest of the Windows 95 team) played DOOM, but after a while, first-person-shooter games started giving me a headache. By the time Quake came out, I had already abandoned playing FPS games.

    I don't remember what it was that I did specifically, but it was along the lines of helping them with various technical issues related to running under Windows. At the time, I was a kernel developer, and the advice I gave was almost certainly related to memory management and swapping.

    Sorry it wasn't anything exciting.

  • The Old New Thing

    How did real-mode Windows implement its LRU algorithm without hardware assistance?

    • 29 Comments

    I noted some time ago that real-mode Windows had to do all its memory management without any hardware assistance. And yet, along the way, they managed to implement an LRU-based discard algorithm. Gabe is really interested in how that was done.

    As we saw a few months ago, inter-segment calls were redirected through a little stub which either jumped directly to the target (if it was in memory) or loaded the target (possibly discarding other memory to make room) before jumping to it. And we saw that the executable format had INT 3Fh instructions baked into it so that the Entry Table could be loaded directly into memory for execution.

    As it happens, Windows didn't take advantage of that feature, because it wanted to do more.

    When it came time to load the Entry Table, the loader did a little rewriting, converting each

        db  flags
        INT 3Fh
        db  entry_segment
        dw  entry_offset
    

    sequence into

        db  flags
        sar byte ptr cs:[xxx], 1
        INT 3Fh
        db  entry_segment
        dw  entry_offset
    

    where the xxx refers to a table of bytes in the Entry Table preallocated for this purpose, initialized to 1's.

    What is "this purpose"?

    Whenever anybody needed the address of an inter-segment function, instead of return the address of the int 3Fh, the kernel returned the address of the sar instruction. The sar instruction stands for shift arithmetic right, For a byte value, this means to shift the bits right one place, but keep the high-order bit the same.

    a b c d e f g h
    a a b c d e f g

    Okay, so what was the effect of sticking this little sar instruction at the start of every inter-segment call? Since the values in the table were initialized to 1, a right arithmetic shift changed the 1 to 0. Therefore, each time an inter-segment call was performed, the corresponding byte in the table was set to zero.

    Hooray, a software-implemented Accessed bit!

    Every 250 milliseconds, Windows scanned and reset the Access bits, using the data to maintain an LRU-list of all the segments in the system. That way, when it was time to discard some memory, it could discard the least recently used ones first.

    Today, a timer that runs continuously at 250ms would incur the wrath of the power management team. But back in the days of real-mode Windows, there was no power management. Like Chuck Norris, PCs ran at only one power level: Awesome.

    I continue to be amazed at how much Windows 1.0 accomplished with so little.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    What does the HTOBJECT hit-test code do?

    • 25 Comments

    Leo Davidson observes that a hit-test code is defined for HTOBJECT, but it is not documented, and wonders what's up.

    #define HTOBJECT            19
    

    The HTOBJECT is another one of those features that never got implemented. The code does nothing and nobody uses it. It was added back in Windows 95 for reasons lost to the mists of time, but when the reason for adding it vanished (maybe a feature got cut), it was too late to remove it from the header file because that would require renumbering HTCLOSE and HTHELP, two values which were in widespread use already. So the value just stayed in the header file, taking up space but accomplishing nothing.

  • The Old New Thing

    The continuing battle between people who offer a service and others who want to hack into the service

    • 32 Comments

    In the history of the Internet, there have been many cases of one company providing a service, and others trying to piggyback off the service through a nonstandard client. The result is usually a back-and-forth where the provider changes the interface, the piggybacker reverse-engineers the interface, back and forth, until one side finally gives up.

    Once upon a time, there was one company with a well-known service, and another company that was piggybacking off it. (I first heard this story from somebody who worked at the piggybacking company.) The back-and-forth continued for several rounds, until the provider made a change to the interface that ended the game: They exploited a buffer overflow bug in their own client. The server sent an intentional buffer overflow to the client, resulting in the client being pwned by the server. I'm not sure what happened next, but presumably the server sent some exploit code to the client and waited for the client to respond in a manner that confirmed that the exploit had executed.

    With that discovery, the people from the piggybacking company gave up. They weren't going to introduce an intentional security flaw into their application. The service provider could send not only the exploit but also some code to detect and disable the rogue client.

    By an amazing stroke of good fortune, I happened to also hear the story of this battle from somebody who worked at the provider. He said that they had a lot of fun fighting this particular battle and particularly enjoyed timing the releases so they caused maximum inconvenience for their adversaries, like, for example, 2am on Saturday.

    Reminder: The ground rules prohibit "trying to guess the identity of a program whose name I did not reveal."

  • The Old New Thing

    Tracking shortcuts and the early history of multiple monitors

    • 25 Comments
    Commenter Roni put two suggestions in the suggestion box in the same entry, which is a problem for me because I feel like I'm forced to answer both of them or neither.

    The first question suggestion has to do with how shortcuts can find their targets even if they've been renamed. This is something I had covered nearly a year before the question was asked, so the reason I'm not answering that question isn't that I'm ignoring the question. It's that I already answered it.

    While I'm at it, here are other questions that I've already answered:

    The other question was a series of questions about the history of multiple monitor support in Windows.

    Actually, I think I've already discussed all of the parts of this question suggestion, so today's entry is more like a clip show. "Remember the first time I talked about multiple monitors?"

    Windows 98 was the first version of Windows to support multiple monitors. (Code to support multiple monitors started being written shortly after Windows 95 was out the door, so my guess is that the preliminary design work overlapped the end of the Windows 95 project.) To facilitate development of code that takes advantage of multiple monitors, the multimon.h header file was introduced so you could code as if multiple monitor support was present in the operating system, and it would emulate the multimon APIs (with a single monitor) if running on Windows 95.

    In Windows 98, the maximum number of monitors was nine. There was no restriction on color depth or resolution, because the most common configuration involved one powerful graphics card combined with one really lame one.

    When support for multiple monitors was ported to Windows NT, the Windows NT folks figured they could one-up the Windows 98 team. Literally. The maximum number of monitors was increased from nine to ten. Who knows, maybe someday it will go to eleven.

  • The Old New Thing

    How did real-mode Windows patch up return addresses to discarded code segments?

    • 19 Comments

    Last week, I described how real-mode Windows fixed up jumps to functions that got discarded. But what about return addresses to functions that got discarded?

    The naïve solution would be to allocate a special "return address recovery" function for each return address you found, but that idea comes with its own problems: You are patching addresses on the stack because you are trying to free up memory. It would be a bad idea to try to allocate memory while you're trying to free memory! What if in order to satisfy the allocation, you had to discard still more memory? You would start moving and patching stacks before they were fully patched from the previous round of memory motion. The stack patcher would get re-entered and see an inconsistent stack because the previous stack patcher didn't get a chance to finish. The result would be rampant memory corruption.

    Therefore, you have to preallocate your "return address recovery" functions. But you don't know how many return addresses you're going to need until you walk the stack (at which point it's too late), and you definitely don't want to preallocate the worst-case scenario since each stack can be up to 64KB in size, and can hold up to 16384 return addresses. You'd end up allocating nearly all your available memory just for return address recovery stubs!

    How did real-mode Windows solve this problem?

    It magically found a way to put ten pounds of flour in a five-pound bag.

    For each segment, there was a special "return thunk" that was shared by all return addresses which returned back into that segment. Since there is only one per segment, you can preallocate it as part of the segment bookkeeping data. To patch the return address, the original return address was overwritten by this shared return thunk. Okay, so you have 32 bits of information you need to save (the original return address, which consists of 16 bits for the segment and 16 bits for the offset), and you have a return thunk that captures the segment information. But you still have 16 bits left over. Where do you put the offset?

    We saw some time ago that the BP register associated with far calls was incremented before being pushed on the stack. This was necessary so that the stack patcher knew whether to decode the frame as a near frame or a far frame. But that wasn't the only rule associated with far stack frames. On entry to a far function,

    • The first thing you do is increment the BP register and push it onto the stack.
    • The second thing you do is push the DS register. (DS is the data segment register, which holds a segment containing data the caller function wanted to be able to access.)

    Every far call therefore looks like this on the stack:

    xxxx+6IP of return address
    xxxx+4CS of return address
    xxxx+2pointer to next stack frame (bottom bit set)
    xxxx+0DS of caller

    The stack patcher overwrote the saved CS:IP with the address of the return thunk. The return thunk describes the segment that got discarded, so the CS is implied by your choice of return thunk, but the stack patcher still needed to save the IP somewhere. So it saved it where the DS used to be.

    Wait, you've just traded one problem for another. Sure, you found a place to put the IP, but now you have to find a place to put the DS.

    Here comes the magic.

    Recall that on the 8086, the combination segment:offset corresponds to the physical address segment×16 + offset. For example, the address 1234:0005 refers to physical byte 0x1234 * 16 + 0x0005 = 0x12345.

    Since the segment and offset were both 16-bit values, there were multiple ways to refer to the same physical address. For example, 1000:2345 would also resolve to physical address 0x12345. But there are other ways to refer to the same byte, like the not entirely obvious 0FFF:2355. In fact, there's a whole range of values you can use, starting from 0235:FFF5 and ending at 1234:0005. In general, there are 4096 different ways of referring to the same address.

    There's a bit of a problem with very low addresses, though. To access byte 0x00400, for example, you could use 0000:0400 through 0040:0000, but that's as far as you could go, so these very low addresses do not have the full complement of aliases.

    Aha, but they do if you take advantage of wraparound. Since the 8086 had only 20 address lines, any overflow in the calculations was simply taken mod 0x100000. Therefore, you could also use F041:FFF0 to refer to the address, because 0xF041 × 16 + 0xFFF0 = 0x100400 ≡ 0x00400. Wraparound allowed the full range of 4096 aliases since you could use F041:FFF0 to FFFF:0410, and then 0000:0400 to 0040:0000.

    Related reading: The story of the mysterious WINA20.386 file.

    Okay, back to stack patching.

    Once you consider aliasing, you realize that the 32 bits of flour actually had a lot of air in it. By rewriting the address of the return thunk into the form XXXX:000Y, you can see the 12 bits of air, and to stash the 12-bit value N into that air pocket, you would set the segment to XXXX−N and the offset to N×16+Y.

    Recall that we were looking for a place to put that saved DS value, which is a 16-bit value, and we have found 12 bits of air in which to save it. We need to find 4 more bits of air somewhere.

    The next trick is realizing that DS is not an arbitrary 16-bit value. It's a 16-bit segment value that was obtained from the Windows memory manager. Therefore, if the Windows memory manager imposed a generous artificial limit of 4096 segments, it could convert the DS segment value into an integer segment index.

    That index got saved in the upper 12 bits of the offset.

    Okay, let's see what happens when the code tries to unwind to the patched return address.

    The function whose return address got patched goes through the usual function epilogue. It pops what it thinks is the original DS off the stack, even though that DS has been secretly replaced by the original return address's IP. The epilogue then pops the old BP, decrements it, and returns to the return thunk. The return thunk now knows where the real return address is (it knows which segment it is responsible for, and it can figure out the IP from the incoming DS register). It can also study its own IP to extract the segment index and from that recover the original DS value. Now that it knows what the original code was trying to do, it can reload the segment, restore the registers to their proper values, and jump to the original return address inside the newly-loaded segment.

    I continue to be amazed at how real-mode Windows managed to get so much done with so little.

    Exercise: The arbitrary limit of 4096 segments was quite generous, seeing as the maximum number of selectors in protected mode was defined by the processor to be 8191. What small change could you make to expand the segment limit in real mode to match that of protected mode?

  • The Old New Thing

    How did real-mode Windows fix up jumps to functions that got discarded?

    • 25 Comments

    In a discussion of how real-mode Windows walked stacks, commenter Matt wonders about fixing jumps in the rest of the code to the discarded functions.

    I noted in the original article that "there are multiple parts to the solution" and that stack-walking was just one piece. Today, we'll look at another piece: Inter-segment fixups.

    Recall that real-mode Windows ran on an 8086 processor, a simple processor with no memory manager, no CPU privilege levels, and no concept of task switching. Memory management in real-mode Windows was handled manually by the real-mode kernel, and the way it managed memory was by loading code from disk on demand, and discarding code when under memory pressure. (It didn't discard data because it wouldn't know how to regenerate it, and it can't swap it out because there was no swap file.)

    There were a few flags you could attach to a segment. Of interest for today's discussion are movable (and it was spelled without the "e") and discardable. If a segment was not movable (known as fixed), then it was loaded into memory and stayed there until the module was unloaded. If a segment was movable, then the memory manager was allowed to move it around when it needed to defragment memory in order to satisfy a large memory allocation. And if a segment was discardable, then it could even be evicted from memory to make room for other stuff.

    MovableDiscardableMeaning
    NoNoCannot be moved or discarded
    NoYes(invalid combination)
    YesNoCan be moved in memory
    YesYesCan be moved or purged from memory

    I'm going to combine the movable and discardable cases, since the effect is the same for the purpose of today's discussion, the difference being that with discardable memory, you also have the option of throwing the memory out entirely.

    First of all, let's get the easy part out of the way. If you had an intra-segment call (calling a function in your own segment), then there was no work that needed to be done. Real-mode Windows always discarded full segments, so if your segment was running code, it was by definition present, and therefore any other code in that segment was also present. The hard part is the inter-segment calls.

    As it happens, an old document on the 16-bit Windows executable file format gives you some insight into how things worked, if you sit down and puzzle it out hard enough.

    Let's start with the GetProcAddress function. When you call GetProcAddress, the kernel needs to locate the address of the function inside the target module. The loader consults the Entry Table to find the function you're asking for. As you can see, there are three types of entries in the Entry Table. Unused entries (representing ordinals with no associated function), fixed segment entries, and movable segment entries. Obviously, if the match is in an unused entry, the return value is NULL because there is no such function. If the match is in a fixed entry, that's pretty easy too: Look up the segment number in the target module's segment list and combine it with the specified offset. Since the segment is fixed, you can just return the raw pointer directly, since the code will never move.

    The tricky part is if the function is in a movable segment. If you look at the document, it says that "a moveable segment entry is 6 bytes long and has the following format." It starts with a byte of flags (not important here), a two-byte INT 3Fh instruction, a one-byte segment number, and the offset within the segment.

    What's the deal with the INT 3Fh instruction? It seems rather pointless to specify that a file format requires some INT 3Fh instructions scattered here and there. Why not get rid of it to save some space in the file?

    If you called GetProcAddress and the result was a function in a movable segment, the GetProcAddress didn't actually return the address of the target function. It returned the address of the INT 3Fh instruction! (Thankfully, the Entry Table is always a fixed segment, so we don't have to worry about the Entry Table itself being discarded.)

    (Now you see why the file format includes these strange INT 3Fh instructions: The file format was designed to be loaded directly into memory. When the loader loads the entry table, it just slurps it into memory and bingo, it's ready to go, INT 3Fh instructions and all!)

    Since GetProcAddress returned the address of the INT 3Fh instruction, calls to imported functions didn't actually go straight to the target. Instead, you called the INT 3Fh instruction, and it was the INT 3Fh handler which said, "Gosh, somebody is trying to call code in another segment. Is that segment loaded?" It took the return address of the interrupt and used it to locate the segment number and offset. If the segment in question was already in memory, then the handler jumped straight to the segment at the specified offset. You got the function call you wanted, just in a roundabout way.

    If the segment wasn't loaded, then the INT 3Fh handler loaded it (which might trigger a round of discarding and consequent stack patching), then jumped to the newly-loaded segment at the specified offset. An even more roundabout function call.

    Okay, so that's the case where a function pointer is obtained by calling GetProcAddress. But it turns out that a lot of stuff inside the kernel turns into GetProcAddress at the end of the day.

    Suppose you have some code that calls a function in another segment within the same module. As we saw earlier, fixups are threaded through the code segment, and if you scroll down to the Per Segment Data section of that old document, you'll see a description of the way the relocation records are expressed. A call to a function to a segment within the same module requires an INTERNALREF fixup, and as you can see in the document, there are two types of INTERNALREF fixups, ones which refer to fixed segments and ones which refer to movable segments.

    The easy case is a reference to a fixed segment. In that case, the kernel can just look up where it put that segment, add in the offset, and patch that address into the code segment. Since it's a fixed segment, the patch will never have to be revisited.

    The hard case is a reference to a movable segment. In that case, you can see that the associated information in the fixup table is the "ordinal number index into [the] Entry Table."

    Aha, you now realize that the Entry Table is more than just a list of your exported functions. It's also a list of all the functions in movable segments that are called from other segments. In a sense, these are "secret exports" in your module. (However, you can't get to them by GetProcAddress because GetProcAddress knows how to keep a secret.)

    To fix up a reference to a function in a movable segment, the kernel calls the SecretGetProcAddress (not its real name) function, which as we saw before, returns not the actual function pointer but rather a pointer to the magic INT 3Fh in the Entry Table. It is that pointer which is patched into your code segment, so that when your code calls what it thinks is a function in another segment, it's really calling the Entry Table, which as we saw before, loads the code in the target segment if necessary before jumping to it.

    Matt wrote, "If the kernel wants to discard that procedure, it has to find that jump address in my code, and redirect it to a page fault handler, so that when my process gets to it, it will call the procedure and fault the code back in. How does it find all of the references to that function across the program, so that it can patch them all up?" Now you know the answer: It finds all of those references because it already had to find them when applying fixups. It doesn't try to find them at discard time; it finds them when it loads your segment. (Exercise: Why doesn't it need to reapply fixups when a segment moves?)

    All inter-segment function pointers were really pointers into the Entry Table. You passed a function pointer to be used as a callback? Not really; you really passed a pointer to your own Entry Table. You have an array of function pointers? Not really; you really have an array of pointers into your Entry Table. It wasn't actually hard for the kernel to find all of these because you had to declare them in your fixup table in the first place.

    It is my understanding that the INT 3Fh trick came from the overlay manager which was included with the Microsoft C compiler. (The Zortech C compiler followed a similar model.)

    Note: While the above discussion describes how things worked in principle, there are in fact a few places where the actual implementation differs from the description above, although not in any way that fundamentally affects the design.

    For example, real-mode Windows did a bit of optimization in the INT 3Fh stubs. If the target segment was in memory, then it replaced the INT 3Fh instruction with a direct jmp xxxx:yyyy to the target, effectively precalculating the jump destination when a segment is loaded rather than performing the calculation each time a function in that segment is called.

    By an amazing coincidence, the code sequence

        int 3fh
        db  entry_segment
        dw  entry_offset
    

    is five bytes long, which is the exact length of a jmp xxxx:yyyy instruction. Phew, the patch just barely fits!

  • The Old New Thing

    What is the history of the GetRandomRgn function?

    • 26 Comments

    An anonymous commenter was curious about how the Get­Random­Rgn function arrived at its strange name, what the purpose of the third parameter is, and why it is inconsistent between Windows 95 and Windows NT.

    The sense of word "random" here is hacker slang rather than its formal probabilistic definition, specifically sense 2: "Assorted, undistinguished", perhaps with a bit of sense 4 sprinkled in: "Not well organized." (Commenter Gabe suggested that a better name would have been Get­Specific­Rgn.)

    Once upon a time, when men were men and Windows was 16-bit, there was an internal function used to communicate between the window manager and GDI in order to set up device contexts. Internally, the region was called the Rao Region, named after Rao Remala, the programmer who invented it, and the function that calculated the Rao Region was rather uncreatively called Compute­Rao­Rgn.

    When porting to 32-bit Windows, the Windows NT and Windows 95 teams both found that they needed this same internal communication between the window manager and GDI. GDI already has a bunch of functions named Get­Xxx­Rgn, so instead of writing a separate marshaler for each one, they opted to write a single Get­Random­Rgn function which takes an integer which serves as a function code, specifying which region the caller actually wants. (I suspect the Windows 95 team followed the cue of the Windows NT team, since Windows NT ran into the problem first.)

    Since this was an internal function, it didn't matter that the name was a bit cutesy, nor did it matter what coordinate system it used, as long as the window manager and GDI agreed on the name and coordinate system. The Windows 95 team still had a lot of 16-bit code that they needed to be compatible with, so they chose to generate the Rao region the same way that the 16-bit Compute­Rao­Rgn function did it. The Windows NT folks, on the other hand, decided that it was more convenient for them that this internal function use screen coordinates, so that's what it returns on Windows NT.

    Get­Random­Rgn isn't really a function that was designed to be public. It was just an internal helper function that outsiders discovered and relied upon to the point that that it became a compatibility constraint so strong that it turned into a de facto documented function. And all the weirdness you see behind it is the weirdness of a function never intended for public consumption.

    The introduction of the Desktop Window Manager in Windows Vista changed the way the visible region was managed (since all windows are logically visible even when occluded because their drawing is redirected to an off-screen surface), but the Get­Random­Rgn function has to keep track of the "visible region" anyway, for compatibility.

  • The Old New Thing

    Why is the Close button in the upper right corner?

    • 47 Comments
    Chris wants to know how the close button ended up to the right of the minimize and maximize/restore buttons. "In OS/2, it is on the left, which left the two other buttons in place."

    I don't know why the Close button went to the upper right instead of going to the left of the other buttons, but I'm going to guess. (That's what I do around here most of the time anyway; I just don't usually call it out.)

    Two words: Fitts's Law.

    The corners of the screen are very valuable, because users can target them with very little effort. You just slam the mouse in the direction you want, and the cursor goes into the corner. And since closing a window is a much more common operation than minimizing, maximizing, and restoring it, it seems a natural choice to give the close button the preferred location.

    Besides, maximizing and restoring a window already have very large targets, namely the entire caption. You can double-click the caption to maximize, and double-click again to restore. The restore even gets you a little bit of Fitt's Law action because the top of the screen makes the height of the caption bar effectively infinite.

Page 5 of 48 (479 items) «34567»