June, 2012

  • The Old New Thing

    Now that Windows makes it harder for your program to block shutdown, how do you block shutdown?

    • 62 Comments

    Up until Windows XP, applications could intercept the WM_QUERY­END­SESSION message and tell Windows, "No, don't shut down." If they were polite about it, they would also inform the user which application blocked system shutdown and why. And if they were really polite about it, they would even provide a way for the user to say, "I don't care; shut down anyway."

    As I noted some time ago, Windows Vista made it harder for applications to block shutdown. Applications are given two seconds to clean up, and then it's game over.

    Okay, now the game of walls and ladders continues. The power management folks created an escape hatch for applications which are doing things like burning a CD or controlling an industrial lathe, where shutting down the machine may not be in the user's best interest. (The user ends up with a coaster or a factory on fire.) But since they created the escape hatch, they get to control the keys to the hatch, too.

    The Shutdown­Block­Reason­Create function lets you register your application window with a custom message that is displayed to the user when they try to shut down the computer. When the danger-time is over, you call Shutdown­Block­Reason­Destroy to say that the coast is clear and shutdown is once again permitted.

    Mind you, these blocks are merely advisory. If users really want to create a coaster or burn down their factory, they can click Force shut down. One nice thing about making Windows responsible for the warning message is that if multiple applications want to block shutdown, all of them can be displayed in a single dialog, and the user only needs to click Force shut down once.

    Further guidance on system shutdown and the use of these functions can be found in the Application Shutdown Changes in Windows Vista document, which was the source material for most of this blog entry.

  • The Old New Thing

    You still need the "safe" functions even if you check string lengths ahead of time

    • 69 Comments

    Commenter POKE53280,0 claims, "If one validates parameters before using string functions (which quality programmers should do), the 'safe' functions have no reason to exist."

    Consider the following function:

    int SomeFunction(const char *s)
    {
      char buffer[256];
      if (strlen(s) ≥ 256) return ERR;
      strcpy(buffer, s);
      ...
    }
    

    What could possibly go wrong? You check the length of the string, and if it doesn't fit in the buffer, then you reject it. Therefore, you claim, the strcpy is safe.

    What could possibly go wrong is that the length of the string can change between the time you check it and the time you use it.

    char attack[512] = "special string designed to trigger a "
                       "buffer overflow and attack your machine. [...]";
    
    void Thread1()
    {
     char c = attack[256];
     while (true) {
      attack[256] ^= c;
     }
    }
    
    void Thread2()
    {
     while (true) {
      SomeFunction(attack);
     }
    }
    

    The first thread changes the length of the string rapidly between 255 and 511, between a string that passes validation and a string that doesn't, and more specifically between a string that passes validation and a string that, if it snuck through validation, would pwn the machine.

    The second thread keeps handing this string to Some­Function. Eventually, the following race condition will be hit:

    • Thread 1 changes the length to 255.
    • Thread 2 checks the length and when it reaches attack[256], it reads zero and concludes that the string length is less than 256.
    • Thread 1 changes the length to 511.
    • Thread 2 copies the string and when it reaches attack[256], it reads nonzero and keeps copying, thereby overflowing its buffer.

    Oops, you just fell victim to the Time-of-check-to-time-of-use attack (commonly abbreviated TOCTTOU).

    Now, the code above as-written is not technically a vulnerability because you haven't crossed a security boundary. The attack code and the vulnerable code are running under the same security context. To make this a true vulnerability, you need to have the attack code running in a lower security context from the vulnerable code. For example, if the threads were running user-mode code and Some­Function is a kernel-mode function, then you have a real vulnerability. Of course, if Some­Function were at the boundary between user-mode and kernel-mode, then it has other things it needs to do, like verify that the memory is in fact readable by the process.

    A more interesting case where you cross a security boundary is if the two threads are running code driven from an untrusted source; for example, they might be threads in a script interpreter, and the toggling of attack[256] is being done by a function on a Web page.

    // this code is in some imaginary scripting language
    
    var attack = new string("...");
    procedure Thread1()
    {
     var c = attack[256];
     while (true) attack[256] ^= c;
    }
    
    handler OnClick()
    {
     new backgroundTask(Thread1);
     while (true) foo(attack);
    }
    

    When the user clicks on the button, the script interpret creates a background thread and starts toggling the length of the string under the instructions of the script. Meanwhile, the main thread calls foo repeatedly. And suppose the interpreter's implementation of foo goes like this:

    void interpret_foo(const function_args& args)
    {
     if (args.GetLength() != 1) wna("foo");
     if (args.GetArgType(0) != V_STRING) wta("foo", 0, V_STRING);
     char *s = args.PinArgString(0);
     SomeFunction(s);
     args.ReleasePin(0);
    }
    

    The script interpreter has kindly converted the script code into the equivalent native code, and now you have a problem. Assuming the user doesn't get impatient and click "Stop script", the script will eventually hit the race condition and cause a buffer overflow in Some­Function.

    And then you get to scramble a security hotfix.

  • The Old New Thing

    Why do you have to wait for Windows Error Reporting to check for solutions before it restarts the application?

    • 47 Comments

    Leo Davidson wonders why you have to wait for Windows Error Reporting to check for solutions before it restarts the application. Why not do the two in parallel?

    Well, for one thing, when the application restarts, it might freak out if it sees a dead copy of itself. I know for sure that I would freak out if I woke up one morning and saw my own dead body lying next to me.

    While Windows Error Reporting is checking for a solution, it still has access to the carcass of the crashed application, because it may need to refer to it in order to answer follow-up questions from the server. ("Hey, was version 3.14 of PI.DLL loaded into the process when it crashed? If so, then I may have an idea what went wrong.") And so that, if you ask it to submit the crash to Microsoft, it can grab the information it needs in order to generate the crash report.

    Now suppose you start up a new copy of the application right away. If the application is a single-instance program, it will look around for another copy of itself, and hey look, it'll find its own lifeless body in the middle of the computer version of an autopsy. It will then try to send messages to the dead program, saying, "Hey, the user wants to open document X; could you do that for me?" And it won't get a response because, well, the program is dead. It's never going to respond.

    Some programs don't even try to pass information along. They just find the existing copy of the program, and call Set­Foreground­Window on its main window, thereby switching to it. Of course, what they tried to do was switched to a crashed program.

    Even worse, what if the second copy of the program tries to extract information from the existing copy of itself? If the existing copy crashed, it's highly likely that the crash was caused by corruption in the program's internal data structures. When the second copy tries to extract the corrupted data, it may itself crash. Immediately launching the replacement program creates a very quickly-growing pile of dead programs, and your screen basically gets spammed with Windows Error Reporting dialogs faster than you can click OK.

    The crashed program has effectively launched a denial of service attack against itself.

    Before trying to start the program again, Windows makes sure that the previous copy has received a proper burial. Because few programs are prepared to see their own cadaver.

    Bonus chatter: Another common scenario is that the program crashes at startup. Automatically restarting the program would just launch another copy that immediately crashes. Again, you get into the situation where you get a dozen copies of the program launched per second, all of which immediately crash.

  • The Old New Thing

    What is the history of the GetRandomRgn function?

    • 26 Comments

    An anonymous commenter was curious about how the Get­Random­Rgn function arrived at its strange name, what the purpose of the third parameter is, and why it is inconsistent between Windows 95 and Windows NT.

    The sense of word "random" here is hacker slang rather than its formal probabilistic definition, specifically sense 2: "Assorted, undistinguished", perhaps with a bit of sense 4 sprinkled in: "Not well organized." (Commenter Gabe suggested that a better name would have been Get­Specific­Rgn.)

    Once upon a time, when men were men and Windows was 16-bit, there was an internal function used to communicate between the window manager and GDI in order to set up device contexts. Internally, the region was called the Rao Region, named after Rao Remala, the programmer who invented it, and the function that calculated the Rao Region was rather uncreatively called Compute­Rao­Rgn.

    When porting to 32-bit Windows, the Windows NT and Windows 95 teams both found that they needed this same internal communication between the window manager and GDI. GDI already has a bunch of functions named Get­Xxx­Rgn, so instead of writing a separate marshaler for each one, they opted to write a single Get­Random­Rgn function which takes an integer which serves as a function code, specifying which region the caller actually wants. (I suspect the Windows 95 team followed the cue of the Windows NT team, since Windows NT ran into the problem first.)

    Since this was an internal function, it didn't matter that the name was a bit cutesy, nor did it matter what coordinate system it used, as long as the window manager and GDI agreed on the name and coordinate system. The Windows 95 team still had a lot of 16-bit code that they needed to be compatible with, so they chose to generate the Rao region the same way that the 16-bit Compute­Rao­Rgn function did it. The Windows NT folks, on the other hand, decided that it was more convenient for them that this internal function use screen coordinates, so that's what it returns on Windows NT.

    Get­Random­Rgn isn't really a function that was designed to be public. It was just an internal helper function that outsiders discovered and relied upon to the point that that it became a compatibility constraint so strong that it turned into a de facto documented function. And all the weirdness you see behind it is the weirdness of a function never intended for public consumption.

    The introduction of the Desktop Window Manager in Windows Vista changed the way the visible region was managed (since all windows are logically visible even when occluded because their drawing is redirected to an off-screen surface), but the Get­Random­Rgn function has to keep track of the "visible region" anyway, for compatibility.

  • The Old New Thing

    How did real-mode Windows fix up jumps to functions that got discarded?

    • 25 Comments

    In a discussion of how real-mode Windows walked stacks, commenter Matt wonders about fixing jumps in the rest of the code to the discarded functions.

    I noted in the original article that "there are multiple parts to the solution" and that stack-walking was just one piece. Today, we'll look at another piece: Inter-segment fixups.

    Recall that real-mode Windows ran on an 8086 processor, a simple processor with no memory manager, no CPU privilege levels, and no concept of task switching. Memory management in real-mode Windows was handled manually by the real-mode kernel, and the way it managed memory was by loading code from disk on demand, and discarding code when under memory pressure. (It didn't discard data because it wouldn't know how to regenerate it, and it can't swap it out because there was no swap file.)

    There were a few flags you could attach to a segment. Of interest for today's discussion are movable (and it was spelled without the "e") and discardable. If a segment was not movable (known as fixed), then it was loaded into memory and stayed there until the module was unloaded. If a segment was movable, then the memory manager was allowed to move it around when it needed to defragment memory in order to satisfy a large memory allocation. And if a segment was discardable, then it could even be evicted from memory to make room for other stuff.

    MovableDiscardableMeaning
    NoNoCannot be moved or discarded
    NoYes(invalid combination)
    YesNoCan be moved in memory
    YesYesCan be moved or purged from memory

    I'm going to combine the movable and discardable cases, since the effect is the same for the purpose of today's discussion, the difference being that with discardable memory, you also have the option of throwing the memory out entirely.

    First of all, let's get the easy part out of the way. If you had an intra-segment call (calling a function in your own segment), then there was no work that needed to be done. Real-mode Windows always discarded full segments, so if your segment was running code, it was by definition present, and therefore any other code in that segment was also present. The hard part is the inter-segment calls.

    As it happens, an old document on the 16-bit Windows executable file format gives you some insight into how things worked, if you sit down and puzzle it out hard enough.

    Let's start with the GetProcAddress function. When you call GetProcAddress, the kernel needs to locate the address of the function inside the target module. The loader consults the Entry Table to find the function you're asking for. As you can see, there are three types of entries in the Entry Table. Unused entries (representing ordinals with no associated function), fixed segment entries, and movable segment entries. Obviously, if the match is in an unused entry, the return value is NULL because there is no such function. If the match is in a fixed entry, that's pretty easy too: Look up the segment number in the target module's segment list and combine it with the specified offset. Since the segment is fixed, you can just return the raw pointer directly, since the code will never move.

    The tricky part is if the function is in a movable segment. If you look at the document, it says that "a moveable segment entry is 6 bytes long and has the following format." It starts with a byte of flags (not important here), a two-byte INT 3Fh instruction, a one-byte segment number, and the offset within the segment.

    What's the deal with the INT 3Fh instruction? It seems rather pointless to specify that a file format requires some INT 3Fh instructions scattered here and there. Why not get rid of it to save some space in the file?

    If you called GetProcAddress and the result was a function in a movable segment, the GetProcAddress didn't actually return the address of the target function. It returned the address of the INT 3Fh instruction! (Thankfully, the Entry Table is always a fixed segment, so we don't have to worry about the Entry Table itself being discarded.)

    (Now you see why the file format includes these strange INT 3Fh instructions: The file format was designed to be loaded directly into memory. When the loader loads the entry table, it just slurps it into memory and bingo, it's ready to go, INT 3Fh instructions and all!)

    Since GetProcAddress returned the address of the INT 3Fh instruction, calls to imported functions didn't actually go straight to the target. Instead, you called the INT 3Fh instruction, and it was the INT 3Fh handler which said, "Gosh, somebody is trying to call code in another segment. Is that segment loaded?" It took the return address of the interrupt and used it to locate the segment number and offset. If the segment in question was already in memory, then the handler jumped straight to the segment at the specified offset. You got the function call you wanted, just in a roundabout way.

    If the segment wasn't loaded, then the INT 3Fh handler loaded it (which might trigger a round of discarding and consequent stack patching), then jumped to the newly-loaded segment at the specified offset. An even more roundabout function call.

    Okay, so that's the case where a function pointer is obtained by calling GetProcAddress. But it turns out that a lot of stuff inside the kernel turns into GetProcAddress at the end of the day.

    Suppose you have some code that calls a function in another segment within the same module. As we saw earlier, fixups are threaded through the code segment, and if you scroll down to the Per Segment Data section of that old document, you'll see a description of the way the relocation records are expressed. A call to a function to a segment within the same module requires an INTERNALREF fixup, and as you can see in the document, there are two types of INTERNALREF fixups, ones which refer to fixed segments and ones which refer to movable segments.

    The easy case is a reference to a fixed segment. In that case, the kernel can just look up where it put that segment, add in the offset, and patch that address into the code segment. Since it's a fixed segment, the patch will never have to be revisited.

    The hard case is a reference to a movable segment. In that case, you can see that the associated information in the fixup table is the "ordinal number index into [the] Entry Table."

    Aha, you now realize that the Entry Table is more than just a list of your exported functions. It's also a list of all the functions in movable segments that are called from other segments. In a sense, these are "secret exports" in your module. (However, you can't get to them by GetProcAddress because GetProcAddress knows how to keep a secret.)

    To fix up a reference to a function in a movable segment, the kernel calls the SecretGetProcAddress (not its real name) function, which as we saw before, returns not the actual function pointer but rather a pointer to the magic INT 3Fh in the Entry Table. It is that pointer which is patched into your code segment, so that when your code calls what it thinks is a function in another segment, it's really calling the Entry Table, which as we saw before, loads the code in the target segment if necessary before jumping to it.

    Matt wrote, "If the kernel wants to discard that procedure, it has to find that jump address in my code, and redirect it to a page fault handler, so that when my process gets to it, it will call the procedure and fault the code back in. How does it find all of the references to that function across the program, so that it can patch them all up?" Now you know the answer: It finds all of those references because it already had to find them when applying fixups. It doesn't try to find them at discard time; it finds them when it loads your segment. (Exercise: Why doesn't it need to reapply fixups when a segment moves?)

    All inter-segment function pointers were really pointers into the Entry Table. You passed a function pointer to be used as a callback? Not really; you really passed a pointer to your own Entry Table. You have an array of function pointers? Not really; you really have an array of pointers into your Entry Table. It wasn't actually hard for the kernel to find all of these because you had to declare them in your fixup table in the first place.

    It is my understanding that the INT 3Fh trick came from the overlay manager which was included with the Microsoft C compiler. (The Zortech C compiler followed a similar model.)

    Note: While the above discussion describes how things worked in principle, there are in fact a few places where the actual implementation differs from the description above, although not in any way that fundamentally affects the design.

    For example, real-mode Windows did a bit of optimization in the INT 3Fh stubs. If the target segment was in memory, then it replaced the INT 3Fh instruction with a direct jmp xxxx:yyyy to the target, effectively precalculating the jump destination when a segment is loaded rather than performing the calculation each time a function in that segment is called.

    By an amazing coincidence, the code sequence

        int 3fh
        db  entry_segment
        dw  entry_offset
    

    is five bytes long, which is the exact length of a jmp xxxx:yyyy instruction. Phew, the patch just barely fits!

  • The Old New Thing

    Don't be helpless: What might be the reason for a "Path not found" error?

    • 74 Comments

    Internally at Microsoft, we have a programmer's tool which I will call Program Q. On the peer-to-peer mailing list for Program Q, somebody asked the following question:

    When I try to do a q edit template template_name, instead of opening an editor window where I can modify the template, I get the following error:

    Error opening for write:
    C:\Users\Waldo\AppData\Local\Temp\2\t144t4.tmp
    The system cannot find the path specified.
    

    Can you help resolve this error?

    Okay, there is already everything you need in the error message. The program even converted the error number to error text for you. You just have to read it and think about what it's telling you.

    The file is C:\Users\Waldo\AppData\Local\Temp\2\t144t4.tmp. Therefore the path is C:\Users\Waldo\AppData\Local\Temp\2. I leave you to determine the next step in the investigation.

    That was apparently not enough of a nudge in the right direction.

    While the error message does say "The system cannot find the path specified," the fact remains that I did not specify a path at all. The path in the error message is completely unknown to me. I could try to navigate to that path in Windows Explorer, but I doubt that this has anything to do with resolving the problem.

    Normally, I get an editor window that lets me edit the template, but instead I get this strange error message which I've never seen before.

    I did not try to navigate to the path mentioned in the error message simply because the mentioned Temp folder C:\Users\Waldo\AppData\Local\Temp is completely empty!

    The helplessness is so thick you can cut it with a knife! I also find it astonishing that the person thinks that verifying whether the path can be found is totally unrelated to resolving a "Path not found" error.

    Don't forget, this is a programmer's tool. One should assume that the people who use it have some level of technical skill!

    Okay, first we have a "Path not found" error, and there is a fully-qualified file name whose path couldn't be found. First thing to check is whether the path really exists. From the most recent reply, one can see that the answer is "No, it does not exist." The 2 subdirectory is missing from the Temp directory.

    Okay, so we verified that the error message is valid. The next thing to determine is where the program got this path from. The person already recognized that it was the Temp directory, and it shouldn't be a huge deductive leap to determine that the path probably came from the TEMP or TMP environment variable.

    The observation that the Temp directory is completely empty suggests that the person, in an obsessive-compulsive fit, deleted everything from the Temp directory, including the 2 subdirectory. Too bad that their TEMP environment variable still contained a reference to it.

    As a result, any program that wants to create a temporary file will try to create it in a directory that doesn't exist. Result: "Path not found."

    The fix: Re-create the 2 subdirectory that you mistakenly deleted. (And yes, this fixed the problem.)

    It somehow seemed completely surprising to this person that a "Path not found" error could possibly mean that a path couldn't be found.

  • The Old New Thing

    Why does PrintWindow hate CS_PARENTDC? Because EVERYBODY hates CS_PARENTDC!

    • 16 Comments

    Commenter kero wants to know why the Print­Window function hates CS_PARENT­DC. (And CS_CLASS­DC, and CS_OWN­DC.)

    Because everybody hates CS_PARENT­DC! (And CS_CLASS­DC, and CS_OWN­DC.)

    We saw earlier that these class styles violate widely-held assumptions about how drawing works. I mean, who would have thought that asking for two device contexts would give you the same one back twice? Or that changes to one device context would secretly modify another (because they're really the same)? Or that a window procedure assumes that it will see only one device context ever?

    The Print­Window function is really in a pickle when faced with a window with one of these class styles, because the whole point of Print­Window is to render into some other device context. The Print­Window function says "Render into this other device context, please," and the window acts like a bratty two-year-old who refuses to eat food from anything other than his favorite monkey plate. "This is not my monkey plate. I will now throw a tantrum."

    The Print­Window function passes a custom device context as a parameter to the WM_PRINT message, and the window says, "Hey, no fair! My class styles say that you aren't allowed to pass any old device context; you have to pass mine. I will now take my broccoli and mash it all over the table."

    Oh well. At least it tried.

    Yet another reason to avoid these weird class styles.

  • The Old New Thing

    Globalization quiz: In honor of, well, that's part of the quiz

    • 46 Comments
    The Corporate Citizenship Tools; Microsoft Local Language Program Web site contains a map of the world, coded by region. There was a bug on the map. See if you can spot it:

      Asia
      Europe
      Middle East & Africa
      North America
      South America
      South Pacific

    After I pointed out the error, they fixed the map on their Web page, so no fair clicking through to the Local Language Program Web page and comparing the pictures!

    Non-useful hint: I chose the publication date of this quiz in honor of the answer.

    Bonus chatter: Inside the answer.

  • The Old New Thing

    How did my hard drive turn into a TARDIS?

    • 44 Comments

    A customer observed that the entry for a network drive looked liked this in My Computer, well, except that there was a network drive icon instead of ASCII art.

    O Public (\\server) (S:)
     
     
    3.81TB free of 2.5GB

    How is it possible for a 2.5GB drive to have 3.81TB free?

    While there have certainly been examples of Explorer showing confusing values the reason for the strange results was, at least this time, not Explorer's fault.

    This particular network drive actually reported (via Get­Disk­Free­Space­Ex) that it had more free space than drive space. Explorer is dutifully reporting the information it was given, because it doesn't try to second-guess the file system. If a network drive wants to report that it is a TARDIS, then it's a TARDIS.

  • The Old New Thing

    It's not a good idea to give multiple controls on a dialog box the same ID

    • 6 Comments

    When you build a dialog, either from a template or by explicitly calling Create­Window, one of the pieces of information about each control is a child window identifier. And it's probably in your best interest to make sure two controls on the dialog don't have the same ID number.

    Of course, one consequence of giving two control the same ID number is that the Get­Dlg­Item function won't know which one to return when you say, "Please give me control number 100." This naturally has a cascade effect on all the other functions which are built atop the Get­Dlg­Item function, such as Set­Dlg­Item­Int.

    Another reason to avoid duplication is that many notification messages include the control ID, and if you have a duplicate, you won't know which one generated the notification. Okay, this isn't actually the case, because the notification messages typically also include the window handle, so you can use the window handle to distinguish between your two controls both with ID=100. Though it means that you can't use a simple switch statement any more.

    (See sidebar for discussion of when duplicate IDs are acceptable.)

    Most of the time, you get away with the duplicate IDs because you can use the window handle to distinguish them. But there is one notable case where duplicate IDs cause problems: Identifying the default pushbutton on the dialog.

    One of the things that the dialog manager does when it builds a dialog box from a template is keep an eye out for a button control with the BS_DEF­PUSH­BUTTON style. When it finds one, it remembers the control ID so that it can restore it as the default pushbutton when focus is on a non-pushbutton control. (When focus is on a pushbutton, then that button becomes the default pushbutton.)

    The dialog manager records the initial default pushbutton by sending itself the DM_SET­DEF­ID message, and the default handler merely records the value in a safe place so it can return it when somebody sends the DM_GET­DEF­ID message. You can send the DM_SET­DEF­ID message yourself if you want to change the default pushbutton, and that's where the trouble comes in.

    The only parameter to the DM_SET­DEF­ID message is the ID of the dialog control you want to be the new default, so if your dialog box has two controls with that ID, then you've created a bit of a problem. When the user hits the Enter key, the dialog manager wants to fire a WM_COMMAND notification for the default button, but it sees two of them and gets confused.

    Actually, it doesn't really get confused. It just picks one of them arbitrarily and ignores the other one.

    And then confusion sets in.

    If the two buttons with conflicting IDs do different things, then your code which receives the WM_COMMAND notification may end up seeing the notification coming from the wrong control. For example, suppose you inadvisedly assign ID number 100 to both the Reformat and Scan buttons (and out of an abundance of caution, set Scan as the default pushbutton). When the user hits Enter, the dialog manager sends a WM_GET­DEF­ID message to say, "Hey, what's the default pushbutton?" The message returns 100, and now the dialog manager is stuck saying to itself, "Um, there are two 100's. Eeny, meeny, miny, moe. Okay, it's the Reformat button." Boom, hard drive reformatted.

    From the same dialog template, suppose you realize, "Oh, I don't want to let the user reformat the hard drive until they've entered a volume label," so you disable the Reformat button if the volume label field is blank. The user hits Enter, and remember, you set Scan as the default button. But since Reformat and Scan have the same control ID, the dialog manager once again plays eeny-meeny-miney-moe, and say it picks the Reformat button. But it also sees that the Reformat button is disabled, so it just beeps.

    And then your user wonders why, when they hit Enter and the Scan button is the default pushbutton, it doesn't scan but instead just beeps.

    Okay, all this discussion seems pretty obvious, doesn't it, but as we dig deeper into the dialog manager, you'll see how the principle of "Don't create a dialog box with conflicting dialog control IDs" has perhaps unexpected consequences.

    Sidebar: If the duplications are all among controls that do not raise notifications and which you do not need to identify programmatically, then you're not going to run into much trouble at all. By convention, the "control ID for controls where I don't care about the ID" is −1, although you can use any number you like; the window manager doesn't care, as long as it doesn't collide with the ID of a control that you do care about.

    Note that some resource management tools (such as translation toolkits, or interactive dialog editors) assume that there are no duplicate IDs aside from the special don't-care value −1, so if you're going to use duplicate IDs because you don't care, you'd be well-served to stick to the −1 convention.

    Bonus chatter: Why doesn't DM_SET­DEF­ID take a window handle instead of a control ID? That would solve the problem of multiple controls with the same ID, since you now have the window handle, which identifies the control uniquely.

    Yeah, it could've done that. Though it would also have created problems if the default pushbutton is destroyed, and that happens more often than you think.

    Remember back in the early 16-bit days, we didn't have parameter validation, so a dangling window handle meant that you crashed when you tried to use it. (Or worse, the window handle could have been reused for another totally unrelated window! Window handle reuse was much more common in 16-bit Windows.) Mapping the window handle back to an ID and then converting the ID to a window on demand meant that you never keep a window handle around, which means that you never have to worry about the handle going bad.

    Making the DM_SET­DEF­ID message handle-based would also make it harder for somebody to pull the "Create two controls with the same ID but hide one of them at runtime" trick described above, because they would also have to remember to send a hypothetical DM_SET­DEF­HWND message whenever they pulled the switcheroo.

    And besides, the only people this design change helps out are people who put multiple visible controls on a dialog box with the same ID. You don't optimize for the case where somebody is mis-using your system.

Page 1 of 3 (25 items) 123