History

  • The Old New Thing

    For Honor, For Excellence, For Pizza

    • 18 Comments

    Hacker News member citizenlow recalls the time I went over after hours to help out the Money team debug a nasty kernel issue. They were running into mysterious crashes during their stress testing and asked for my help in debugging it.

    I helped out other teams quite a bit, like writing a new version of Dr. Watson for the Windows 98 team or writing a new version of the MSConfig tool based on a sketch on a bar napkin. And for a time, I followed the official policy for moonlighting to make sure everybody understood that I was doing work outside the boundaries of my official job duties.

    When the Money folks asked me for help, I told them that before I could help them, they would have to help me fill out some paperwork.

    • Who will you be working for? Microsoft Corporation.
    • Where will you be doing the work? Office XX/YYYY on Microsoft Redmond Campus.
    • When will the work begin and end? Begin on YYYY/MM/DD at 5pm, ending YYYY/MM/DD at 11pm.
    • How much will you be paid for this work?

    The Money folks were not sure how to answer that last question, since they didn't have any formal budget or procedures for hiring an outside consultant, much less any procedures for hiring one from inside the company.

    I told them, "Just write One slice of pizza."

    Nobody from the Personnel department seemed to notice the odd circumstances of this moonlighting request; they simply rubber-stamped it and put it in my file.

    The crash, it turns out, was in Windows itself. There was a bug in the special compiler the Languages team produced to help build certain components of Windows 95 which resulted in an incorrect address computation under a particularly convoluted boundary condition. The Money folks had merely stumbled across this bug as part of their regular testing. I notified the appropriate people, and the Windows team applied a workaround in their code to tickle the compiler into generating the correct code.

    As I recall, the pizza was just fine. It was just your average delivery pizza, nothing gourmet or anything. Not that it had to be, because I wasn't there for the pizza.

  • The Old New Thing

    What happened to the Shut Down menu in classic Task Manager?

    • 61 Comments
    The great thing about open comments is that anybody can use them to introduce their favorite gripe as long as it shares at least four letters of the alphabet in common with the putative topic of the base article.

    xpclient "asks" why the Shut Down menu was removed from Task Manager. I put the word "asks" in quotation marks, because it's really a complaint disguised as a question. As in "Why do you guys suck?"

    The first thing to understand is that classic Task Manager went into a state of sustained engineering since Windows 2000. In other words, the component is there, but there is no serious interest in improving it. (That's why it wasn't updated to call Enable­Theme­Dialog­Texture on its pages.) It's not like there's a Task Manager Team of five people permanently dedicated to making Task Manager as awesome as possible for every release of Windows. Rather, the responsibility for maintaining Task Manager is sort of tacked onto somebody whose primary responsibilities are for other parts of the system.

    There are a lot of Windows components in this state of "internal sustained engineering." The infamous "Install font" dialog, for example. The responsibility for maintaining these legacy components is spread out among the product team so that on average, teams are responsible both for cool, exciting things and some not-so-cool, legacy things.

    (On the other hand, according to xpclient, an app must be serving its users really well if it hasn't changed much, so I guess that Install font dialog is the best dialog box in all of Windows at serving its users, seeing as it hasn't changed since 1995.)

    The engineering budget for these components in internal sustained engineering is kept to a minimum, both because there is no intention of adding new features, and also because the components are so old that there is unlikely to be any significant work necessary in the future.

    Every so often, some work becomes necessary, and given that the engineering interest and budget are both very low, the simplest way out when faced with a complicated problem in a rarely-used feature is simply to remove the rarely-used feature.

    And that's what happened to the Shut Down menu. (Note that it's two words "Shut down" since it is being used as a verb, not a noun.) Given the changes to power management in Windows Vista, the algorithm used by Task Manager was no longer accurate. And instead of keeping Task Manager updated with every change, the Shutdown user interface design team agreed to give the Task Manager engineering team a break and say, "Y'know, the Shut Down menu on Task Manager is rarely-used, so we'll let you guys off the hook on this one, so you don't keep getting weekly requests from us to change the way Shut Down works."

    I remember, back in the days of Windows XP, seeing the giant spreadsheet used by the person responsible for overall design of the Shutdown user interface. It tracked the gazillion group policies, user settings, and system configurations which all affect how shutting down is presented to the user. Removing the column for Task Manager from the spreadsheet probably was met with a huge sigh of relief, not just from the Task Manager engineering team, but also from the person responsible for the spreadsheet.

    Remember, engineering is about trade-offs. If you decide to spend more effort making Task Manager awesome, you lose the ability to expend that effort on something else. (And given that you are expending effort in a code base that is relatively old and not fresh in the minds of the people who would be making those changes, you also increase the likelihood that you're going to introduce a bug along the way.)

  • The Old New Thing

    10 is the new 6

    • 35 Comments

    While it may no longer be true that everything at Microsoft is built using various flavors of Visual C++ 5.0, 6.0, and 7.0, there is still a kernel of truth in it: A lot of customers are still using Visual C++ 6.0.

    That's why the unofficial slogan for Visual C++ 2010 was 10 is the new 6. Everybody on the team got a T-shirt with the slogan (because you don't have a product until you have a T-shirt).

  • The Old New Thing

    Who would ever write a multi-threaded GUI program?

    • 37 Comments

    During the development of Windows 95, the user interface team discovered that a component provided by another team didn't work well under multi-threaded conditions. It was documented that the Initialize function had to be the first call made by a thread into the component.

    The user interface team discovered that if one thread called Initialize, and then used the component, then everything worked great. But if a second thread called Initialize, the component crashed whenever the second thread tried to use it.

    The user interface team reported this bug back to the team that provided the component, and some time later, an updated version of the component was delivered.

    Technically, the bug was fixed. When the second thread called Initialize, the function now failed with ERROR_NOT_SUPPORTED.

    The user interface team went back to the team that provided the component. "It's nice that your component detects that it is being used by a multi-threaded client and fails the second thread's attempt to initialize it. But given that design, how can a multi-threaded client use your component?"

    The other team's reply was, "It doesn't matter. Nobody writes multi-threaded GUI programs."

    The user interface team had to politely reply, "Um, we are. The next version of Windows will be built on a multi-threaded shell."

    The other team said, "Oh, um, we weren't really expecting that. Hang on, we'll get back to you."

    The idea that somebody might write a multi-threaded program that used their component caught them by surprise, and they had to come up with a new design of their component that supported multiple threads in a clean way. It was a lot of work, but they came through, and Windows 95 could continue with its multi-threaded shell.

  • The Old New Thing

    The code names for various subprojects within Windows 95

    • 25 Comments

    Most people know that Windows 95 was code-named Chicago. The subprojects of Windows 95 also had their code names, in part because code names are cool, and in part because these projects were already under way by the time somebody decided to combine them into one giant project.

    Component Code Name
    16-bit DOS kernel Jaguar
    32-bit DOS kernel Cougar
    Win32 kernel Panther
    User interface Stimpy

    Even when they were separate projects, the first three teams worked closely together, so the names followed a pattern of ferocious cats. My guess is that when the user interface team chose their code name, they heard that the other guys were naming themselves after cats, so they picked a cat, too.

    I don't know whether they did that on purpose or by accident, but the cat they picked was not ferocious at all. Instead, they picked a cartoon cat.

    Bonus trivia: When the feature to show a special message after Windows had shut down was first added, the shutdown bitmap was a screen shot of Ren and Stimpy saying good-bye. Fortunately, we remembered to replace them before shipping.

    If you were paying attention: You would have noticed that code names get reused a lot, not because of any connection between the projects but purely by coincidence.

  • The Old New Thing

    When was the WM_COPYDATA message introduced, and was it ported downlevel?

    • 9 Comments

    Gabe wondered when the WM_COPY­DATA message was introduced.

    The WM_COPY­DATA message was introduced by Win32. It did not exist in 16-bit Windows.

    But it was there all along.

    The The WM_COPY­DATA message was carefully designed so that it worked in 16-bit Windows automatically. In other words, you retained your source code compatibility between 16-bit and 32-bit Windows without having to do a single thing. Phew, one fewer breaking change between 16-bit and 32-bit Windows.

    As Neil noted, there's nothing stopping you from sending message 0x004A in 16-bit Windows with a window handle in the wParam and a pointer to a COPY­DATA­STRUCT in the lParam. Since all 16-bit applications ran in the same address space, the null marshaller successfully marshals the data between the two processes.

    In a sense, support for the WM_COPY­DATA message was ported downlevel even before the message existed!

  • The Old New Thing

    Letting the boss think your project is classier than it really is

    • 16 Comments

    Once upon a time, there was a team developing two versions of a product, the first a short-term project to ship soon, and the other a more ambitious project to ship later. (Sound familiar?) They chose to assign the projects code names Ren and Stimpy, in honor of the lead characters from the eponymous cartoon series.

    Over time, the two projects merged, and the code name that stuck was Ren.

    When the project came up in a meeting with Bill Gates, it was mentioned verbally but never spelled out, and since Bill wasn't closely tuned into popular culture, he mapped the sound /rɛn/ not to the hairless Mexican dog but to the Christopher Wren, architect of St. Paul's Cathedral. In follow-up email, he consistently referred to the project by the name "Wren".

    The Ren team liked the fact that their name gave the boss the impression that the project was going to be a masterpiece of architectural beauty, so they never told him he got the name wrong.

    Even though it has nothing to do with the story: The project in question is the one that eventually became known to the world as Outlook.

  • The Old New Thing

    Why does the common file save dialog create a temporary file and then delete it?

    • 29 Comments

    When you call GetSaveFileName, the common file save dialog will ask the user to choose a file name, and just before it returns it does a little create/delete dance where it creates the file the user entered, and then deletes it. What's up with that?

    This is a leftover from the ancient days of 16-bit Windows 3.1, back when file systems were real file systems and didn't have this namby-pamby "long file name" or "security" nonsense. (Insert sound effect of muscle flexing and gutteral grunting.)

    Back in those days, the file system interface was MS-DOS, and MS-DOS didn't have a way to query security attributes because, well, the file systems of the day didn't have security attributes to query in the first place.

    But network servers did.

    If you mapped a network drive from a server running one of those fancy new file systems, then you were in this case where your computer didn't know anything about file system security, but the server did. The only way to find out whether you had permission to create a file in a directory was to try it and see whether it worked or whether it failed with the error ERROR_ACCESS_DENIED (or, as it was called back in the MS-DOS days, "5"),

    Another reason why a server might reject a file name was that it contained a character that, while legal in Windows, was not legal on the server. At the time, the most common reason for this was that you used a so-called "extended character" (in other words, a character outside the ASCII range like an accented lowercase e) which was part of your local code page but not on the server's.

    Yet another possibility was that the file name you chose would exceed the server's path name limit. For example, suppose the server is running Windows for Workgroups (which has a 64-character maximum path name limit), and it shared out C:\some\deep\directory\on\the\server as \\server\share. If you mapped M: to \\server\share, then the maximum path name on M: was only about 30 characters because C:\some\deep\directory\on\the\server used up half of your 64-character limit.

    The only way to tell whether the file could be created, then, was to try to create it and see what happens. After creating the test file (to see if it could), the common file save dialog immediately deleted it in order to cover its tracks. (This could lead to some weird behavior if users picked a directory where they had permission to create files but no permission to delete files that they created!)

    This "test to see if I can create the file by creating it" behavior has been carried forward ever since, but you can suppress it by passing the OFN_NOTESTFILECREATE flag.

  • The Old New Thing

    Why is Rundll32 called Rundll32 and not just Rundll?

    • 35 Comments

    There is an oft-abused program named rundll32.exe. Why does its name end in 32? Why not just call it rundll.exe? (I will for the moment ignore the rude behavior of calling people stupid under the guise of asking a question.)

    Because there needed to be a way to distinguish the 16-bit version from the 32-bit version.

    Windows 95 had both rundll.exe (the 16-bit version) and rundll32.exe (the 32-bit version). Of course, with the gradual death of support for 16-bit Windows, the 16-bit rundll.exe is now just a footnote in history, leaving just the 32-bit version.

    But why did the two have to have different names? Why not just use the same name (rundll.exe) for both, putting the 16-bit version in the 16-bit system directory and the 32-bit version in the 32-bit system directory?

    Because Windows 95 didn't have separate 16-bit and 32-bit system directories. There was just one system directory called SYSTEM and everything hung out there, both 16-bit and 32-bit, like one big happy family.

    Well, maybe not a happy family.

    At any rate, when 64-bit Windows was introduced, the plan was not to do things the crazy mishmash way and instead separate the 32-bit files into one directory and the 64-bit files into a different directory. That way, no files needed to be renamed, and your batch file that ran rundll32.exe with some goofy command line still worked, even on 64-bit Windows.

  • The Old New Thing

    What happened in real-mode Windows when somebody did a longjmp into a discardable segment?

    • 14 Comments

    During the discussion of how real-mode Windows handled return addresses into discarded segments, Gabe wondered, "What happens when somebody does a long­jmp into a discardable segment?"

    I'm going to assume that everybody knows how long­jmp traditionally works so I can go straight to the analysis.

    The reason long­jmp is tricky is that it has to jump to a return address that isn't on the stack. (The return address was captured in the jmp_buf.) If that segment got relocated or discarded, then the jump target is no longer valid. It would have gotten patched to a return thunk if it were on the stack, but since it's in a jmp_buf, the stack walker didn't see it, and the result is a return address that is no longer valid. (There is a similar problem if the data segment or stack segment got relocated. Exercise: Why don't you have to worry about the data segment or stack segment being discarded?)

    Recall that when a segment got discarded, all return addresses which pointed into that segment were replaced with return thunks. I didn't mention it explicitly in the original discussion, but there are three properties of return thunks which will help us here:

    • It is safe to invoke a return thunk even if the associated code segment is in memory. All that happens is that the "ensure the segment is present" step is a nop, and the return thunk simply continues with its work of recovering the original state.
    • It is safe to abandon a return thunk without needing to do any special cleanup. All the state used by the return thunk is stored in the patched stack itself, so if you want to abandon a return thunk, all you need to do is free the stack space.
    • It is safe to reuse a return thunk. Since they are statically allocated, you can use them over and over as long as the associated code segment has not been freed.

    The first property (idempotence of the return thunk) is no accident. It's required behavior in order for return thunks to work at all! After all, if the segment was loaded (say by a direct call or some other return thunk), then the return thunk needs to say, "Well, I guess that was easy," and simply skip the "load the target segment" step. (It still needs to do the rest of the work, of course.)

    The second property (abandonment) is also no accident. An application might decide to exit without returning all the way to Win­Main (the equivalent of calling Exit­Process instead of returning from Win­Main). This would abandon all the stack frames between the exit point and the Win­Main.

    The third property (reuse) is a happy accident. (Well, it was probably designed in for the purpose we're about to put it to right here.)

    Okay, now let's look at the jump buffer again. If you've been following along so far, you may have guessed the solution: Pre-patch the return address as if it had already been discarded. If it turns out that the segment was discarded, then the return thunk will restore it. If the segment is present (either because it was never discarded, or because it was discarded and reloaded, possibly at a new address), the return thunk will figure out where the code is and jump to it.

    Actually, since the state is being recorded in a jmp_buf, the tight space constraints of stack patching do not apply here. If it turns out you need 20 bytes of memory to record this information, then go ahead and make your jmp_buf 20 bytes. You don't have to try to make it all fit inside an existing stack frame.

    The jmp_buf therefore doesn't have to try to play the crazy air-squeezing games that stack patching did. It can record the return thunk, the handles to the data and stack segments, and the return IP without any encoding at all. And in fact, the long­jmp function doesn't need to invoke the return thunk directly. It can just extract the segment number after the initial INT 3Fh and pass that directly to the segment loader.

    (There is a little hitch if the address being returned to is fixed; in that case, there is no return thunk. But that just makes things easier: The lack of a return thunk means that the return address cannot be relocated, so there is no patching needed at all!)

    This magic with return thunks and segment reloading is internal to the operating system, so the core set­jmp and long­jmp functionality was provided by the kernel rather than the C runtime library in a pair of functions called Catch and Throw. The C runtime's set­jmp and long­jmp functions merely forwarded to the kernel versions.

Page 3 of 50 (499 items) 12345»