March, 2006

  • The Old New Thing

    Why are there two copies of Notepad?

    • 85 Comments

    You may have noticed that there's a copy of Notepad in %windir%\notepad.exe and another in %windir%\system32\notepad.exe. Why two?

    Compatibility, of course.

    Windows 3.0 put Notepad in the Windows directory. Windows NT put it in the System32 directory.

    Notepad is perhaps the most commonly hardcoded program in Windows. many Setup programs use it to view the Readme file, and you can use your imagination to come up with other places where a program or batch file or printed instructions will hard-code the path to Notepad.

    In order to be compatible with programs designed for Windows 95, there needs to be a copy of Notepad in the Windows directory. And in order to be compatible with programs designed for Windows NT, there also needs to be a copy in the System32 directory.

    And now that Notepad exists in both places, new programs have a choice of Notepads, and since there is no clear winner, half of them will choose the one in the Windows directory and half will choose the one in the System32 directory, thereby ensuring the continued existence of two copies of Notepad for years to come.

  • The Old New Thing

    The network interoperability compatibility problem, first follow-up of many

    • 133 Comments

    Okay, there were an awful lot of comments yesterday and it will take me a while to work through them all. But I'll start with some more background on the problem and clarifying some issues that people had misinterpreted.

    As a few people surmised, the network file server software in question is Samba, a version of which comes with most Linux distributions. (I'll have to do a better job next time of disguising the identities of the parties involved.) Samba is also very popular as the network file server for embedded devices such as network-attached storage. The bug in question is fixed in the latest version of Samba, but none of the major distributions have picked up the fix yet. Not that that helps the network-attached storage scenario any.

    It appears that a lot of people though the buggy driver was running on the Windows Vista machine, since they started talking about driver certification and blocking its installation. The problem is not on the Windows Vista machine; the problem is on the file server, which is running Linux. WHQL does not certify Linux drivers, it can't stop you from installing a driver on some other Linux machine, and it certainly can't download an updated driver and somehow upgrade your Linux machine for you. Remember, the bug is on the server, which is another computer running some other operating system. Asking Windows to update the driver on the remote server makes about as much sense as asking Internet Explorer to upgrade the version of Apache running on slashdot.org. You're the client; you have no power over the server.

    Some people lost sight of the network-attached storage scenario, probably because they weren't familiar with the term. A network-attached storage device is a self-contained device consisting of a large hard drive, a tiny computer, and a place to plug in a network cable. The computer has an operating system burned into its ROMs (often a cut-down version of Linux with Samba), and when you turn it on, the device boots the computer, loads the operating system, and acts as a file server on your network. Since everything is burned into ROM, claiming that the driver will get upgraded and the problem will eventually be long forgotten is wishful thinking. It's not like you can download a new Samba driver and install it into your network-attached storage device. You'll have to wait for the manufacturer to release a new ROM.

    As for detecting a buggy driver, the CIFS protocol doesn't really give the client much information about what's running on the server, aside from a "family" field that identifies the general category of the server (OS/2, Samba, Windows NT, etc.) All that a client can tell, therefore, is "Well, the server is running some version of Samba." It can't tell whether it's a buggy version or a fixed version. The only way to tell that you are talking to a buggy server is to wait for the bug to happen.

    (Which means that people who said, "Windows Vista should just default to the slow version," are saying that they want Windows Vista to run slow against Samba servers and fast against Windows NT servers. This plays right into the hands of the conspiracy theorists.)

    My final remark for today is explaining how a web site can "bloat the cache" of known good/bad servers and create a denial of service if the cache did not have a size cap: First, set up a DNS server that directs all requests for *.hackersite.com to your Linux machine. On this Linux machine, install one of the buggy versions of Samba. Now serve up this web page:

    <IFRAME SRC="\\a1.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
    <IFRAME SRC="\\a2.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
    <IFRAME SRC="\\a3.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
    <IFRAME SRC="\\a4.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
    ...
    <IFRAME SRC="\\a10000.hackersite.com" HEIGHT=1 WIDTH=1></IFRAME>
    

    Each of those IFRAMEs displays an Explorer window with the contents of the directory \\a1.hackersite.com\b. (Since all the names resolve to the same machine, all the \\*.hackersite.com machines are really the same.) In that directory, put 200 files, so as to trigger the "more than 100 files" bug and force Windows Vista to cache the server as a "bad" server. In this way, you forced Windows Vista to create ten thousand records for the ten thousand bad servers you asked to be displayed. Throw in a little more script and you can turn this into a loop that accesses millions of "different" servers (all really the same server). If the "bad server" cache did not have a cap, you just allowed a bad server to consume megabytes of memory that will never be freed until the computer is rebooted. Pretty neat trick.

    Even worse, if you proposed preserving this cache across reboots, then you're going to have to come up with a place to save this information. Whether you decide that it goes in a file or in the registry, the point is that an attacker can use this "bloat attack" and cause the poor victim's disk space/registry usage to grow without bound until they run out of quota. And once they hit quota, be it disk quota or registry quota, not only do bad things start happening, but they don't even know what file or registry key they have to delete to get back under quota.

    Next time, I'll start addressing some of the proposals that people came up with, pointing out disadvantages that they may have missed in their analysis.

  • The Old New Thing

    Restating the obvious about the WM_COMMAND message

    • 27 Comments

    I'm satisfied with the MSDN documentation for the WM_COMMAND message, but for the sake of mind-numbing completeness, I'm going to state the obvious in the hope that you, dear readers, can use this technique to fill in the obvious in other parts of MSDN.

    The one-line summary of the WM_COMMAND message says, "The WM_COMMAND message is sent when the user selects a command item from a menu, when a control sends a notification message to its parent window, or when an accelerator keystroke is translated." In a nutshell, there are three scenarios that generate a WM_COMMAND message, namely the three listed above. You want to think of the menu and accelerator scenarios of the WM_COMMAND message as special cases of the control scenario.

    The high-order word of the wParam parameter "specifies the notification code if the message is from a control". What does "control" mean here? Remember that you have to take things in context. The WM_COMMAND message is being presented in the context of Win32 in general, and in the context of the window manager in particular. Windows such as edit boxes, push buttons, and list boxes are commonly called "controls", as are all the window classes in the "common controls library". In the world of the window manager, a "control" is a window whose purpose is to provide some degree of interactivity (which, in the case of the static control, might be no interactivity at all) in the service of its parent window. The fact that the WM_COMMAND is used primarily in the context of dialog boxes further emphasizes the point that the term "control" here is just a synonym for "child window".

    What does "notification code" mean here? Control notification codes are arbitrary 16-bit values defined by the control itself. By convention, they are named xxN_xxxx, where the "N" stands for "notification". Be careful, however, not to confuse this with notification codes associated with the WM_NOTIFY message. Fortunately, every notification code specifies in its documentation whether it arrives as a WM_COMMAND notification or a WM_NOTIFY notification. A modern control designer is more likely to use WM_NOTIFY notifications since they allow additional information to be passed with the notification. The WM_COMMAND message, by comparison, passes only the notification itself; the other parameters to the WM_COMMAND message are forced, as we'll see below. If WM_NOTIFY is superior to WM_COMMAND, why do some controls use WM_COMMAND? Because WM_NOTIFY wasn't available until Windows 95. Controls that were written prior to Windows 95 had to content themselves with the WM_COMMAND message.

    "If the message is from an accelerator, this value [the high-order word of the wParam parameter] is 1." Remember, we're still in the context of the window manager, and particular in the context of the WM_COMMAND message. The accelerator here refers to messages generated by the call to TranslateAccelerator in the message loop.

    "If the message is from a menu, this value is zero." If the WM_COMMAND mesage was triggered by the user selecting an item from a menu, then the high-order word of the wParam is zero.

    The low-order word of the wParam parameter "specifies the identifier of the menu item, control, or accelerator." The identifier of a menu item or accelerator is the command code you associated with it in your menu or accelerator template or (in the case of a menu item) when you manually created the menu item with a function like InsertMenuItem. (You probably named your menu item identifiers and accelerator identifiers IDM_something.) The identifier of a control is determined by the creator of the control; recall that the hMenu parameter to the CreateWindow and CreateWindowEx functions is treated as a child window identifier if you're creating a child window. It is that identifier that the control identifier. (You can retrieve the identifier for a control by calling the GetDlgCtrlID function.)

    Finally, the lParam parameter is the "handle to the control sending the message if the message is from a control. Otherwise, this parameter is NULL." If the notification is generated by a child window (with a notification code appropriate for that child window, obviously), then that child window handle is passed as the lParam. If the notification is generated by an accelerator or a menu, then the lParam is zero.

    Notice that nearly all of the parameters to the WM_COMMAND message are forced, once you've decided what notification you're generating.

    If you are generating a notification from a control, you must pass the notification code in the high word of the wParam, the control identifier in the low word of the wParam, and the control handle as the lParam. In other words, once you've decided that the hwndC window wants to send a CN_READY notification, you have no choice but to type

    SendMessage(GetParent(hwndC), WM_COMMAND,
                MAKEWPARAM(GetDlgCtrlID(hwndC), CN_READY),
                (LPARAM)hwndC);
    

    In other words, all control notifications take the form

    SendMessage(GetParent(hwndC), WM_COMMAND,
                MAKEWPARAM(GetDlgCtrlID(hwndC), notificationCode),
                (LPARAM)hwndC);
    

    where hwndC is the control generating the notification and notificationCode is the notification code. Of course, you can use PostMessage instead of SendMessage if you would rather post the notification rather than sending it.

    The other two cases (accelerators and menus) are not cases you would normally code up, since you typically let the TranslateAccelerator function deal with accelerators and let the menu system deal with menu identifiers. But if for some reason, you wanted to pretend that the user had typed an accelerator or selected a menu item, you can generate the notification manually by following the rules set out in the documentation.

    // simulate the accelerator IDM_WHATEVER
    SendMessage(hwnd, WM_COMMAND,
                MAKEWPARAM(IDM_WHATEVER, 1),
                0);
    

    Here, hwnd is the window that you want to pretend was the window passed to the TranslateAccelerator function, and IDM_WHATEVER is the accelerator identifier.

    Simulating a menu selection is exactly the same, except that (according to the rules above), you set the high-order word of the wParam to zero.

    // simulate the menu item IDM_WHATEVER
    SendMessage(hwnd, WM_COMMAND,
                MAKEWPARAM(IDM_WHATEVER, 0),
                0);
    

    Here, hwnd is the window associated with the menu. A window can be associated with a menu either by being created with the menu (having passed the menu handle to the CreateWindow or CreateWindowEx function explicitly, or having it done implicitly by including it with the class registration) or by having been passed explicitly as the window parameter to a function like TrackPopupWindow.

    One significant difference between the accelerator/menu case and the control notification case is that accelerator and menu identifiers are defined by the calling application, whereas control notifications are defined by the control.

    You may have noticed the opportunity to "pun" the control notification codes. If a control defines a notification code as zero, then it will "look like" a menu item selection, since the high-order word of the wParam in the case of a menu item selection is zero. The button control takes advantage of this pun:

    #define BN_CLICKED          0
    

    This means that when the user clicks a button control, the WM_COMMAND message that is generated "smells like" a menu selection notification. You probably take advantage of this in your dialog procedure without even realizing it.

    (The static control also takes advantage of this pun:

    #define STN_CLICKED         0
    

    but in order for the static control to generate the STN_CLICKED notification, you have to set the SS_NOTIFY style.)

    I stated at the start that the accelerator and menu scenarios are just special cases of the control scenario. If you take the pieces of the WM_COMMAND message apart, you'll see that they fall into two categories:

    • What happened? (Notification code.)
    • Whom did it happen to? (Control handle and ID.)

    In the case of a menu or an accelerator, the "What happened?" is "The user clicked on the menu (0)" or "The user typed the accelerator (1)". The "Whom did it happen to?" is "This menu ID" or "This accelerator ID". Since the notification is not coming from a control, the control handle is NULL.

    I apologize to all you Win32 programmers for whom this is just stating the obvious.

    Now that you're an expert on the WM_COMMAND message, perhaps you can solve this person's problem.

  • The Old New Thing

    How would you solve this compatibility problem: Network interoperability

    • 200 Comments

    Okay, everybody, here's your chance to solve a compatibility problem. There is no answer yet; I'm looking to see how you folks would attack it. This is a real bug in the Windows Vista database.

    A beta tester reported that Explorer fails to show more than about a hundred files per directory from file servers running a particular brand of the file server software. The shell and networking teams investigated the problem together and tracked it down to the server incorrectly handling certain types of directory queries. Although the server claims to support both slow and fast queries, if you try a fast query, it returns only the first hundred or so files and then gives up with a strange error code. On the other hand, if Explorer switches to the slow query, then everything works fine. (Windows XP always used the slow query.) Additional data: An update to the server software was released earlier this year which claims to fix the bug. However (as of this writing), all of the vendor's distributors continue to ship the buggy version of the driver.

    What should we do? Here are some options. Choose of of the below or make up your own!

    Do nothing

    Make no accomodation for this particular buggy protocol implementation. People who are running that particular implementation will get incomplete directory listings. Publish a Knowledge Base article describing the problem and directing customers to contact the vendor for an updated driver.

    Advantages:

    • Operating system remains "pure", unsullied by compatibility hacks.

    Disadvantages:

    • Customers with this problem may not even realize that they have it.
    • Even if customers notice something wrong, they won't necessarily know to search for the vendor's name (as opposed to the distributor's name) in the Knowledge Base to see if there are any known interoperability problems with it.
    • And even if the customer finds the Knowledge Base article, they will have to bypass their distributor and get the driver directly from the vendor. This may invalidate their support contract with the distributor.
    • If the file server software is running on network attached storage, the user likely doesn't even know what driver is running inside the sealed plastic case. Upgrading the server software will have to wait for the distributor to issue a firmware upgrade. Until then, the user will experience temporary data loss. (Those files beyond the first hundred are invisible.)
    • If the customer does not own the file server, the best they can do is ask the file server's administrator to upgrade their driver and hope the administrator agrees to do so.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Auto-detect the buggy driver and put up a warning dialog

    Explorer should recognize the strange error code and display an error message to the user saying, "The server \\servername appears to be running an old version of the XYZ driver that does not report the contents of large directories properly. Not all items in the directory are shown here. Please contact the administrator of the machine \\servername to have the driver upgraded." (Possibly with a "Don't show this dialog again" check-box.)

    Advantages:

    • Users are told why they are getting incomplete results.

    Disadvantages:

    • There's not much the user can do about the incomplete results. It looks like a "Ha ha, you lose" dialog.
    • Users often don't know who the administrators of a file server are, so telling them to contact the administrator merely leads to a frustrated, "And who is that, huh?", or even worse, "That's me! And I have no idea what this dialog box is telling me to do." (Consider the network attached storage device.)
    • The administrator of that machine might have his/her reasons for not upgrading the driver (for example, because it voids the support contract), but they will keep getting pestered by users thanks to this new dialog.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Auto-detect the buggy driver and work around it next time

    Explorer should recognize the strange error code and say, "Oh, this server must have the buggy driver. It's too late to do anything about the current directory information, but I'll remember that I should do things the slow way in the future when talking to this server."

    To avoid denial-of-service attacks, remember only the last 16 (say) servers that exhibit the problem. (If the list of "known bad" servers were unbounded, then an attacker could consume all the memory on your computer by creating a server that responded to a billion different names and using HTTP redirects to get you to visit all of those servers in turn.)

    Advantages:

    • Windows auto-detects the problem and works around it.

    Disadvantages:

    • The first directory listing of a large directory from a buggy server will be incorrect. If that first directory listing is for something that has a long lifetime (for example, Explorer's folder tree), then the incorrect data will persist for a long time.
    • If you regularly visit more than 16 (say) buggy servers, then when you visit the seventeenth, the first one falls out of the cache and will return incorrect data the first time you visit a large directory.
    • May also have to develop and test a mechanism so that network administrators can deploy a "known bad list" of servers to all the computers on their network. In this way, servers on the "known bad list" won't have the "first directory listing is bad" problem.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Have a configuration setting to put the network client into "slow mode"

    Add a configuration setting to the Windows network client to tell it "If somebody asks whether a server supports fast queries, always say No, even if the server says Yes." In this manner, no program will attempt to use fast queries; they will all use slow queries. Directory queries will run slower, but at least they will work.

    Advantages:

    • With the setting set to "slow mode", you never get any incomplete directory listings.

    Disadvantages:

    • Since the detection is not automatic, you have many of the same problems as "Do nothing". Customers have to know that they have a problem and know what to search for before they can find the configuration setting in the Knowledge Base. Until then, the behavior looks like a bug in Windows Vista.
    • This punishes file servers that are not buggy by making them use slow queries even though they support fast queries.

    Have a configuration setting to put Explorer into "slow mode"

    Add a configuration setting to Explorer to tell it "Always issue slow queries; never issue fast queries." Directory queries will run slower, but at least they will work. But this affects only Explorer; other programs which ask the server "Do you support fast queries?" will receive an affirmative response and attempt to use fast queries, only to rediscover the problem that Explorer worked around.

    Advantages:

    • With the setting set to "slow mode", you never get any incomplete directory listings.

    Disadvantages:

    • Every program that uses fast queries must have their own setting for disabling fast queries and running in "slow mode".
    • Plus all the same disadvantages as putting the setting in the network client.

    Disable "fast mode" by default

    Stop supporting "fast mode" in the network client since it is unreliable; there are some servers that don't handle "fast mode" correctly. This forces all programs to use "slow mode". Optionally, have a configuration setting to re-enable "fast mode".

    Advantages:

    • All directory listings are complete. Everything just works.

    Disadvantages:

    • The "fast mode" feature may as well never have been created: It's off by default and nobody will bother turning it on since everything works "well enough".
    • People will accuse Microsoft of unfair business practices since the client will run in "slow mode" even if the server says it supports "fast mode". "Obviously, Microsoft did this in order to boost sales of its competing product which doesn't have this artificial and gratuitous speed limiter."

    Something else

    Be creative. Make sure to list both advantages and disadvantages of your proposal.

  • The Old New Thing

    Basic ground rules for programming - function parameters and how they are used

    • 48 Comments

    There are some basic ground rules that apply to all system programming, so obvious that most documentation does not bother explaining them because these rules should have been internalized by practitioners of the art to the point where they need not be expressed. In the same way that when plotting driving directions you wouldn't even consider taking a shortcut through somebody's backyard or going the wrong way down a one-way street, and in the same way that an experienced chess player doesn't even consider illegal moves when deciding what to do next, an experienced programmer doesn't even consider violating the following basic rules without explicit permission in the documentation to the contrary:

    • Everything not defined is undefined. This may be a tautology, but it is a useful one. Many of the rules below are just special cases of this rule.
    • All parameters must be valid. The contract for a function applies only when the caller adheres to the conditions, and one of the conditions is that the parameters are actually what they claim to be. This is a special case of the "everything not defined is undefined" rule.
      • Pointers are not NULL unless explicitly permitted otherwise.
      • Pointers actually point to what they purport to point to. If a function accepts a pointer to a CRITICAL_SECTION, then you really have to pass pointer to a valid CRITICAL_SECTION.
      • Pointers are properly aligned. Pointer alignment is a fundamental architectural requirement, yet something many people overlook having been pampered by a processor architecture that is very forgiving of alignment errors.
      • The caller has the right to use the memory being pointed to. This means no pointers to memory that has been freed or memory that the caller does not have control over.
      • All buffers are valid to the size declared or implied. If you pass a pointer to a buffer and say that it is ten bytes in length, then the buffer really needs to be ten bytes in length.
      • Handles refer to valid objects that have not been destroyed. If a function wants a window handle, then you really have to pass a valid window handle.
    • All parameters are stable.
      • You cannot change a parameter while the function call is in progress.
      • If you pass a pointer, the pointed-to memory will not be modified by another thread for the duration of the call.
      • You can't free the pointed-to memory either.
    • The correct number of parameters is passed with the correct calling convention. This is another special case of the "everything not defined is undefined" rule.
      • Thank goodness modern compilers refuse to pass the wrong number of parameters, though you'd be surprised how many people manage to sneak the wrong number of parameters past the compiler anyway, usually by devious casting.
      • When invoking a method on an object, the this parameter is the object. Again, this is something modern compilers handle automatically, though people using COM from C (and yes they exist) have to pass the this parameter manually, and occasionally they mess up.
    • Function parameter lifetime.
      • The called function can use the parameters during the execution of the function.
      • The called function cannot use the parameters once the function has returned. Of course, if the caller and the callee have agreed on a means of extending the lifetime, then those rules apply.
        • The lifetime of a parameter that is a pointer to a COM object can be extended by the use of the IUnknown::AddRef method.
        • Many functions are passed parameters with the express intent that they be used after the function returns. It is then the caller's responsibility to ensure that the lifetime of the parameter is at least as long as the function needs it. For example, if you register a callback function, then the callback function needs to be valid until you deregister the callback function.
    • Input buffers.
      • A function is permitted to read from the full extent of the buffer provided by the caller, even if not all of the buffer is required to determine the result.
    • Output buffers.
      • An output buffer cannot overlap an input buffer or another output buffer.
      • A function is permitted to write to the full extent of the buffer provided by the caller, even if not all of the buffer is required to hold the result.
      • If a function needs only part of a buffer to hold the result of a function call, the contents of the unused portion of the buffer are undefined.
      • If a function fails and the documentation does not specify the buffer contents on failure, then the contents of the output buffer are undefined. This is a special case of the "everything not defined is undefined" rule.
      • Note that COM imposes its own rules on output buffers. COM requires that all output buffers be in a marshallable state even on failure. For objects that require nontrivial marshalling (interface pointers and BSTRs being the most common examples), this means that the output pointer must be NULL on failure.

    (Remember, every statement here is a basic ground rule, not an absolute inescapable fact. Assume every sentence here is prefaced with "In the absence of indications to the contrary". If the caller and callee have agreed on an exception to the rule, then that exception applies. For example, a pointer is prototyped as volatile is explicitly marked as "This value can change from another thread," so the rule against modifying function parameters does not apply to such a pointer.)

    Coming up with this was hard, in the same way it's hard to come up with a list of illegal chess moves. The rules are so automatic that they aren't really rules so much as things that simply are and it would be crazy even to consider otherwise. As a result, I'm sure there are other "rules so obvious they need not be said" that are missing. (For example, "You cannot terminate a thread while it is inside somebody else's function.")

    One handy rule of thumb for what you can do to a function call is to ask, "How would I like it if somebody did that to me?" (This is a special case of the "Imagine if this were possible" test.)

  • The Old New Thing

    The ForceAutoLogon setting doesn't do what most people think

    • 12 Comments

    The folks on the logon team wish me to remind you that the ForceAutoLogon setting does more than just log on an account automatically. They've had to deal with large numbers of people who set the key without really understanding what it does, and then getting into trouble because what they get is not what they expected.

    In addition to logging on an account automatically, the ForceAutoLogon setting also logs you back on after you log off. It is designed for machines running as kiosks or other publically-accessible scenarios where you want the kiosk account to be the only account available. Even if the user manages to fiddle with the machine and log off the kiosk user, the logon system will just log the kiosk user back on.

    As a result, setting the ForceAutoLogon setting effectively locks out all users aside from the one you are forcing. If you do this to one of your machines, you'd better have some other way of administering the machine. (Typically, this is done via remote administration.)

  • The Old New Thing

    Solving one problem by creating a bigger problem

    • 84 Comments

    Often, people will not even realize that their solution to a problem merely replaces it with another problem. The quip attributed to Jamie Zawinski captures the sentiment:

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    For example, in response to "How do I write a batch file that..." some people will say, "First, install <perl|bash|monad|...>". This doesn't actually solve the problem; it merely replaces it with a different problem.

    In particular, if the solution begins with "First, install..." you've pretty much lost out of the gate. Solving a five-minute problem by taking a half hour to download and install a program is a net loss. In a corporate environment, adding a program to a deployment is extraordinarily expensive. You have to work with your company's legal team to make sure the licensing terms for the new program are acceptable and do not create undue risk from a legal standpoint. What is your plan of action if the new program stops working, and your company starts losing tens of thousands of dollars a day? You have to do interoperability testing to make sure the new program doesn't conflict with the other programs in the deployment. (In the non-corporate case, you still run the risk that the new program will conflict with one of your existing programs.)

    Second, many of these "solutions" require that you abandon your partial solution so far and rewrite it in the new model. If you've invested years in tweaking a batch file and you just need one more thing to get that new feature working, and somebody says, "Oh, what you need to do is throw away you batch file and start over in this new language," you're unlikely to take up that suggestion.

    So be careful when you suggest a solution that has a high activation energy. Sure, something could be taken care of by a one-line perl script, but getting perl onto the machine is hardly a one-line endeavor.

  • The Old New Thing

    Why doesn't Windows File Protection use ACLs to protect files?

    • 57 Comments

    Windows File Protection works by replacing files after they have been overwritten. Why didn't Windows just apply ACLs to deny write permission to the files?

    We tried that. It didn't work.

    Programs expect to be able to overwrite the files. A program's setup would run and it decided that it needed to "update" some system file and attempt to overwrite it. If the system tried to stop the file from being overwritten, the setup program would halt and report that it was unable to install the file. Even if the operating system detected that somebody was trying to overwrite a system file and instead gave them a handle to NUL, those programs would nevertheless notice that they had been hoodwinked because as a "verification" step, they would open the file they had just copied and compare it against the "master copy" on the installation CD.

    The solution was to let the program think it had won, and then, when it wasn't looking, put the original back.

    Now that Windows File Protection has been around for a few years, software installers have learned that it's not okay to overwrite system files (and trying to do it won't work anyway), so starting in Windows Vista, the Windows File Protection folks have started taking stronger steps to protect system files, and this includes using ACLs to make the files harder to replace. Presumably, they will have compatibility plans in place to accomodate programs whose setup really wants to overwrite a file.

  • The Old New Thing

    Why doesn't the window manager just take over behavior that used to be within the application's purview?

    • 59 Comments

    A commenter named "Al" wondered why the window manager couldn't just take over behavior that used to be within the application's purview, such as painting the non-client area, in order to avoid problems with applications not responding to messages promptly enough. If the window manager were being rewritten, then perhaps it could. But to do it now would introduce many compatibility issues.

    First, there are many applications that have subtle dependencies on message ordering or receiving certain types of messages at certain times, even though there is no actual guarantee in the specification that such messages be delivered. There are a large number of applications that rely on WM_PAINT messages being delivered even if there is nothing to paint, because they defer some critical computations until the first WM_PAINT message, and if something that requires the result of that computation happens before a WM_PAINT, they crash. For example, if you launch a program minimized, then right-click on the taskbar button for the program's main window, these programs would crash because the code that handles the system menu uses a pointer variable that the WM_PAINT handler initializes or divides by a global variable whose default value is zero but whose value is calculated during WM_PAINT handling. To accomodate these programs, the window manager is forced to send "dummy" WM_PAINT messages with an empty rcPaint. These such messages appear to accomplish nothing, but the hidden agenda is that the program gets its cherished WM_PAINT message and can perform whatever operations it is that keeps it from crashing later on.

    Second, removing customizability of message behavior from the window manager would prevent programs from customizing their appearance in nonstandard ways. Media players are perhaps the most popular example of programs that want to override normal non-client painting in order to present a totally customized window to the user. Would you be happy if a change to Windows meant that you could no longer "skin" your favorite media player application?

    That said, there have been changes to the window manager over the years to maintain this "air of customizability" while simultaneously intervening on behalf of the user to keep things from going completely to the dogs. For example, if a window stops painting for an extended period of time, Windows would take it upon itself to paint the window with a standard caption bar (even if the application wanted to customize the caption bar), just so that the user would be able to see something.

    Another example of this "message virtualization" is the appending of the phrase "(Not responding)" to the caption of a window that has stopped responding, and capturing the window contents as they were last visible, drawing those captured window contents in the meantime until the application woke up from its slumber, and even allowing you to move, resize, minimize, and close those unresponsive windows. The infrastructure necessary to support this behavior is quite extensive, because the window manager needs to maintain two sets of bookkeeping. The first is, "What the application thinks the window state is"; if the application asks for the size of its hung window, it needs to be told, "Oh, you're still that size you were before, don't you worry your pretty little head", even though the actual window size on the screen has changed significantly. Once the hung window starts responding to messages again, all the activity that happened "while it was away" needs to be replayed to get the window "back up to speed" with the state of the world. Interesting things happen if the program wanted to customize one of the actions that happened to the "virtual window". For example, it might want to reject certain window sizes or display a special message before minimizing. Resolving these conflicts in a manner that doesn't cause applications to crash outright is another of the difficulties of trying to get the virtual and real window states back into sync.

    In a sense, therefore, the window manager does take over selected behaviors that used to be within the application's purview, but it has to do it in a delicate enough manner that neither the application nor the end user will even realize that it's happening. And that's what makes it hard.

  • The Old New Thing

    Before you develop a solution, make sure you really understand the problem

    • 21 Comments

    A common obstacle when trying to help people solve their problems is that what people ask for and what they actually want are not always the same thing.

    For technical problems, you often get a question that makes you shake your head in disbelief, but upon closer questioning, you find that the person really doesn't want what they're asking for. What they really want is something else, but they've already "solved" half of the problem and only need your help with the other half—the half that doesn't make any sense. For example, the literal answer to "How do I write a regular expression that matches everything except XYZ" is often a horrible mess, but if you dig deeper, you'll find that they really don't need a regular expression that matches everything except XYZ; they just simplified their problem to "I know, I'll use regular expressions" and ended up creating a bigger problem. (The best solution is often a mix of regular expressions and simple program logic.)

    This problem also exists in user interface design. Rick Schaut describes one case where a user asked for a feature, when what they really wanted was an entirely different feature. Understanding the customer's problem is the first step towards solving it.

Page 1 of 5 (41 items) 12345