June, 2012

  • The Old New Thing

    Now that Windows makes it harder for your program to block shutdown, how do you block shutdown?

    • 62 Comments

    Up until Windows XP, applications could intercept the WM_QUERY­END­SESSION message and tell Windows, "No, don't shut down." If they were polite about it, they would also inform the user which application blocked system shutdown and why. And if they were really polite about it, they would even provide a way for the user to say, "I don't care; shut down anyway."

    As I noted some time ago, Windows Vista made it harder for applications to block shutdown. Applications are given two seconds to clean up, and then it's game over.

    Okay, now the game of walls and ladders continues. The power management folks created an escape hatch for applications which are doing things like burning a CD or controlling an industrial lathe, where shutting down the machine may not be in the user's best interest. (The user ends up with a coaster or a factory on fire.) But since they created the escape hatch, they get to control the keys to the hatch, too.

    The Shutdown­Block­Reason­Create function lets you register your application window with a custom message that is displayed to the user when they try to shut down the computer. When the danger-time is over, you call Shutdown­Block­Reason­Destroy to say that the coast is clear and shutdown is once again permitted.

    Mind you, these blocks are merely advisory. If users really want to create a coaster or burn down their factory, they can click Force shut down. One nice thing about making Windows responsible for the warning message is that if multiple applications want to block shutdown, all of them can be displayed in a single dialog, and the user only needs to click Force shut down once.

    Further guidance on system shutdown and the use of these functions can be found in the Application Shutdown Changes in Windows Vista document, which was the source material for most of this blog entry.

  • The Old New Thing

    You still need the "safe" functions even if you check string lengths ahead of time

    • 69 Comments

    Commenter POKE53280,0 claims, "If one validates parameters before using string functions (which quality programmers should do), the 'safe' functions have no reason to exist."

    Consider the following function:

    int SomeFunction(const char *s)
    {
      char buffer[256];
      if (strlen(s) ≥ 256) return ERR;
      strcpy(buffer, s);
      ...
    }
    

    What could possibly go wrong? You check the length of the string, and if it doesn't fit in the buffer, then you reject it. Therefore, you claim, the strcpy is safe.

    What could possibly go wrong is that the length of the string can change between the time you check it and the time you use it.

    char attack[512] = "special string designed to trigger a "
                       "buffer overflow and attack your machine. [...]";
    
    void Thread1()
    {
     char c = attack[256];
     while (true) {
      attack[256] ^= c;
     }
    }
    
    void Thread2()
    {
     while (true) {
      SomeFunction(attack);
     }
    }
    

    The first thread changes the length of the string rapidly between 255 and 511, between a string that passes validation and a string that doesn't, and more specifically between a string that passes validation and a string that, if it snuck through validation, would pwn the machine.

    The second thread keeps handing this string to Some­Function. Eventually, the following race condition will be hit:

    • Thread 1 changes the length to 255.
    • Thread 2 checks the length and when it reaches attack[256], it reads zero and concludes that the string length is less than 256.
    • Thread 1 changes the length to 511.
    • Thread 2 copies the string and when it reaches attack[256], it reads nonzero and keeps copying, thereby overflowing its buffer.

    Oops, you just fell victim to the Time-of-check-to-time-of-use attack (commonly abbreviated TOCTTOU).

    Now, the code above as-written is not technically a vulnerability because you haven't crossed a security boundary. The attack code and the vulnerable code are running under the same security context. To make this a true vulnerability, you need to have the attack code running in a lower security context from the vulnerable code. For example, if the threads were running user-mode code and Some­Function is a kernel-mode function, then you have a real vulnerability. Of course, if Some­Function were at the boundary between user-mode and kernel-mode, then it has other things it needs to do, like verify that the memory is in fact readable by the process.

    A more interesting case where you cross a security boundary is if the two threads are running code driven from an untrusted source; for example, they might be threads in a script interpreter, and the toggling of attack[256] is being done by a function on a Web page.

    // this code is in some imaginary scripting language
    
    var attack = new string("...");
    procedure Thread1()
    {
     var c = attack[256];
     while (true) attack[256] ^= c;
    }
    
    handler OnClick()
    {
     new backgroundTask(Thread1);
     while (true) foo(attack);
    }
    

    When the user clicks on the button, the script interpret creates a background thread and starts toggling the length of the string under the instructions of the script. Meanwhile, the main thread calls foo repeatedly. And suppose the interpreter's implementation of foo goes like this:

    void interpret_foo(const function_args& args)
    {
     if (args.GetLength() != 1) wna("foo");
     if (args.GetArgType(0) != V_STRING) wta("foo", 0, V_STRING);
     char *s = args.PinArgString(0);
     SomeFunction(s);
     args.ReleasePin(0);
    }
    

    The script interpreter has kindly converted the script code into the equivalent native code, and now you have a problem. Assuming the user doesn't get impatient and click "Stop script", the script will eventually hit the race condition and cause a buffer overflow in Some­Function.

    And then you get to scramble a security hotfix.

  • The Old New Thing

    Why do you have to wait for Windows Error Reporting to check for solutions before it restarts the application?

    • 47 Comments

    Leo Davidson wonders why you have to wait for Windows Error Reporting to check for solutions before it restarts the application. Why not do the two in parallel?

    Well, for one thing, when the application restarts, it might freak out if it sees a dead copy of itself. I know for sure that I would freak out if I woke up one morning and saw my own dead body lying next to me.

    While Windows Error Reporting is checking for a solution, it still has access to the carcass of the crashed application, because it may need to refer to it in order to answer follow-up questions from the server. ("Hey, was version 3.14 of PI.DLL loaded into the process when it crashed? If so, then I may have an idea what went wrong.") And so that, if you ask it to submit the crash to Microsoft, it can grab the information it needs in order to generate the crash report.

    Now suppose you start up a new copy of the application right away. If the application is a single-instance program, it will look around for another copy of itself, and hey look, it'll find its own lifeless body in the middle of the computer version of an autopsy. It will then try to send messages to the dead program, saying, "Hey, the user wants to open document X; could you do that for me?" And it won't get a response because, well, the program is dead. It's never going to respond.

    Some programs don't even try to pass information along. They just find the existing copy of the program, and call Set­Foreground­Window on its main window, thereby switching to it. Of course, what they tried to do was switched to a crashed program.

    Even worse, what if the second copy of the program tries to extract information from the existing copy of itself? If the existing copy crashed, it's highly likely that the crash was caused by corruption in the program's internal data structures. When the second copy tries to extract the corrupted data, it may itself crash. Immediately launching the replacement program creates a very quickly-growing pile of dead programs, and your screen basically gets spammed with Windows Error Reporting dialogs faster than you can click OK.

    The crashed program has effectively launched a denial of service attack against itself.

    Before trying to start the program again, Windows makes sure that the previous copy has received a proper burial. Because few programs are prepared to see their own cadaver.

    Bonus chatter: Another common scenario is that the program crashes at startup. Automatically restarting the program would just launch another copy that immediately crashes. Again, you get into the situation where you get a dozen copies of the program launched per second, all of which immediately crash.

  • The Old New Thing

    What is the history of the GetRandomRgn function?

    • 26 Comments

    An anonymous commenter was curious about how the Get­Random­Rgn function arrived at its strange name, what the purpose of the third parameter is, and why it is inconsistent between Windows 95 and Windows NT.

    The sense of word "random" here is hacker slang rather than its formal probabilistic definition, specifically sense 2: "Assorted, undistinguished", perhaps with a bit of sense 4 sprinkled in: "Not well organized." (Commenter Gabe suggested that a better name would have been Get­Specific­Rgn.)

    Once upon a time, when men were men and Windows was 16-bit, there was an internal function used to communicate between the window manager and GDI in order to set up device contexts. Internally, the region was called the Rao Region, named after Rao Remala, the programmer who invented it, and the function that calculated the Rao Region was rather uncreatively called Compute­Rao­Rgn.

    When porting to 32-bit Windows, the Windows NT and Windows 95 teams both found that they needed this same internal communication between the window manager and GDI. GDI already has a bunch of functions named Get­Xxx­Rgn, so instead of writing a separate marshaler for each one, they opted to write a single Get­Random­Rgn function which takes an integer which serves as a function code, specifying which region the caller actually wants. (I suspect the Windows 95 team followed the cue of the Windows NT team, since Windows NT ran into the problem first.)

    Since this was an internal function, it didn't matter that the name was a bit cutesy, nor did it matter what coordinate system it used, as long as the window manager and GDI agreed on the name and coordinate system. The Windows 95 team still had a lot of 16-bit code that they needed to be compatible with, so they chose to generate the Rao region the same way that the 16-bit Compute­Rao­Rgn function did it. The Windows NT folks, on the other hand, decided that it was more convenient for them that this internal function use screen coordinates, so that's what it returns on Windows NT.

    Get­Random­Rgn isn't really a function that was designed to be public. It was just an internal helper function that outsiders discovered and relied upon to the point that that it became a compatibility constraint so strong that it turned into a de facto documented function. And all the weirdness you see behind it is the weirdness of a function never intended for public consumption.

    The introduction of the Desktop Window Manager in Windows Vista changed the way the visible region was managed (since all windows are logically visible even when occluded because their drawing is redirected to an off-screen surface), but the Get­Random­Rgn function has to keep track of the "visible region" anyway, for compatibility.

  • The Old New Thing

    How did real-mode Windows fix up jumps to functions that got discarded?

    • 25 Comments

    In a discussion of how real-mode Windows walked stacks, commenter Matt wonders about fixing jumps in the rest of the code to the discarded functions.

    I noted in the original article that "there are multiple parts to the solution" and that stack-walking was just one piece. Today, we'll look at another piece: Inter-segment fixups.

    Recall that real-mode Windows ran on an 8086 processor, a simple processor with no memory manager, no CPU privilege levels, and no concept of task switching. Memory management in real-mode Windows was handled manually by the real-mode kernel, and the way it managed memory was by loading code from disk on demand, and discarding code when under memory pressure. (It didn't discard data because it wouldn't know how to regenerate it, and it can't swap it out because there was no swap file.)

    There were a few flags you could attach to a segment. Of interest for today's discussion are movable (and it was spelled without the "e") and discardable. If a segment was not movable (known as fixed), then it was loaded into memory and stayed there until the module was unloaded. If a segment was movable, then the memory manager was allowed to move it around when it needed to defragment memory in order to satisfy a large memory allocation. And if a segment was discardable, then it could even be evicted from memory to make room for other stuff.

    MovableDiscardableMeaning
    NoNoCannot be moved or discarded
    NoYes(invalid combination)
    YesNoCan be moved in memory
    YesYesCan be moved or purged from memory

    I'm going to combine the movable and discardable cases, since the effect is the same for the purpose of today's discussion, the difference being that with discardable memory, you also have the option of throwing the memory out entirely.

    First of all, let's get the easy part out of the way. If you had an intra-segment call (calling a function in your own segment), then there was no work that needed to be done. Real-mode Windows always discarded full segments, so if your segment was running code, it was by definition present, and therefore any other code in that segment was also present. The hard part is the inter-segment calls.

    As it happens, an old document on the 16-bit Windows executable file format gives you some insight into how things worked, if you sit down and puzzle it out hard enough.

    Let's start with the GetProcAddress function. When you call GetProcAddress, the kernel needs to locate the address of the function inside the target module. The loader consults the Entry Table to find the function you're asking for. As you can see, there are three types of entries in the Entry Table. Unused entries (representing ordinals with no associated function), fixed segment entries, and movable segment entries. Obviously, if the match is in an unused entry, the return value is NULL because there is no such function. If the match is in a fixed entry, that's pretty easy too: Look up the segment number in the target module's segment list and combine it with the specified offset. Since the segment is fixed, you can just return the raw pointer directly, since the code will never move.

    The tricky part is if the function is in a movable segment. If you look at the document, it says that "a moveable segment entry is 6 bytes long and has the following format." It starts with a byte of flags (not important here), a two-byte INT 3Fh instruction, a one-byte segment number, and the offset within the segment.

    What's the deal with the INT 3Fh instruction? It seems rather pointless to specify that a file format requires some INT 3Fh instructions scattered here and there. Why not get rid of it to save some space in the file?

    If you called GetProcAddress and the result was a function in a movable segment, the GetProcAddress didn't actually return the address of the target function. It returned the address of the INT 3Fh instruction! (Thankfully, the Entry Table is always a fixed segment, so we don't have to worry about the Entry Table itself being discarded.)

    (Now you see why the file format includes these strange INT 3Fh instructions: The file format was designed to be loaded directly into memory. When the loader loads the entry table, it just slurps it into memory and bingo, it's ready to go, INT 3Fh instructions and all!)

    Since GetProcAddress returned the address of the INT 3Fh instruction, calls to imported functions didn't actually go straight to the target. Instead, you called the INT 3Fh instruction, and it was the INT 3Fh handler which said, "Gosh, somebody is trying to call code in another segment. Is that segment loaded?" It took the return address of the interrupt and used it to locate the segment number and offset. If the segment in question was already in memory, then the handler jumped straight to the segment at the specified offset. You got the function call you wanted, just in a roundabout way.

    If the segment wasn't loaded, then the INT 3Fh handler loaded it (which might trigger a round of discarding and consequent stack patching), then jumped to the newly-loaded segment at the specified offset. An even more roundabout function call.

    Okay, so that's the case where a function pointer is obtained by calling GetProcAddress. But it turns out that a lot of stuff inside the kernel turns into GetProcAddress at the end of the day.

    Suppose you have some code that calls a function in another segment within the same module. As we saw earlier, fixups are threaded through the code segment, and if you scroll down to the Per Segment Data section of that old document, you'll see a description of the way the relocation records are expressed. A call to a function to a segment within the same module requires an INTERNALREF fixup, and as you can see in the document, there are two types of INTERNALREF fixups, ones which refer to fixed segments and ones which refer to movable segments.

    The easy case is a reference to a fixed segment. In that case, the kernel can just look up where it put that segment, add in the offset, and patch that address into the code segment. Since it's a fixed segment, the patch will never have to be revisited.

    The hard case is a reference to a movable segment. In that case, you can see that the associated information in the fixup table is the "ordinal number index into [the] Entry Table."

    Aha, you now realize that the Entry Table is more than just a list of your exported functions. It's also a list of all the functions in movable segments that are called from other segments. In a sense, these are "secret exports" in your module. (However, you can't get to them by GetProcAddress because GetProcAddress knows how to keep a secret.)

    To fix up a reference to a function in a movable segment, the kernel calls the SecretGetProcAddress (not its real name) function, which as we saw before, returns not the actual function pointer but rather a pointer to the magic INT 3Fh in the Entry Table. It is that pointer which is patched into your code segment, so that when your code calls what it thinks is a function in another segment, it's really calling the Entry Table, which as we saw before, loads the code in the target segment if necessary before jumping to it.

    Matt wrote, "If the kernel wants to discard that procedure, it has to find that jump address in my code, and redirect it to a page fault handler, so that when my process gets to it, it will call the procedure and fault the code back in. How does it find all of the references to that function across the program, so that it can patch them all up?" Now you know the answer: It finds all of those references because it already had to find them when applying fixups. It doesn't try to find them at discard time; it finds them when it loads your segment. (Exercise: Why doesn't it need to reapply fixups when a segment moves?)

    All inter-segment function pointers were really pointers into the Entry Table. You passed a function pointer to be used as a callback? Not really; you really passed a pointer to your own Entry Table. You have an array of function pointers? Not really; you really have an array of pointers into your Entry Table. It wasn't actually hard for the kernel to find all of these because you had to declare them in your fixup table in the first place.

    It is my understanding that the INT 3Fh trick came from the overlay manager which was included with the Microsoft C compiler. (The Zortech C compiler followed a similar model.)

    Note: While the above discussion describes how things worked in principle, there are in fact a few places where the actual implementation differs from the description above, although not in any way that fundamentally affects the design.

    For example, real-mode Windows did a bit of optimization in the INT 3Fh stubs. If the target segment was in memory, then it replaced the INT 3Fh instruction with a direct jmp xxxx:yyyy to the target, effectively precalculating the jump destination when a segment is loaded rather than performing the calculation each time a function in that segment is called.

    By an amazing coincidence, the code sequence

        int 3fh
        db  entry_segment
        dw  entry_offset
    

    is five bytes long, which is the exact length of a jmp xxxx:yyyy instruction. Phew, the patch just barely fits!

  • The Old New Thing

    Don't be helpless: What might be the reason for a "Path not found" error?

    • 74 Comments

    Internally at Microsoft, we have a programmer's tool which I will call Program Q. On the peer-to-peer mailing list for Program Q, somebody asked the following question:

    When I try to do a q edit template template_name, instead of opening an editor window where I can modify the template, I get the following error:

    Error opening for write:
    C:\Users\Waldo\AppData\Local\Temp\2\t144t4.tmp
    The system cannot find the path specified.
    

    Can you help resolve this error?

    Okay, there is already everything you need in the error message. The program even converted the error number to error text for you. You just have to read it and think about what it's telling you.

    The file is C:\Users\Waldo\AppData\Local\Temp\2\t144t4.tmp. Therefore the path is C:\Users\Waldo\AppData\Local\Temp\2. I leave you to determine the next step in the investigation.

    That was apparently not enough of a nudge in the right direction.

    While the error message does say "The system cannot find the path specified," the fact remains that I did not specify a path at all. The path in the error message is completely unknown to me. I could try to navigate to that path in Windows Explorer, but I doubt that this has anything to do with resolving the problem.

    Normally, I get an editor window that lets me edit the template, but instead I get this strange error message which I've never seen before.

    I did not try to navigate to the path mentioned in the error message simply because the mentioned Temp folder C:\Users\Waldo\AppData\Local\Temp is completely empty!

    The helplessness is so thick you can cut it with a knife! I also find it astonishing that the person thinks that verifying whether the path can be found is totally unrelated to resolving a "Path not found" error.

    Don't forget, this is a programmer's tool. One should assume that the people who use it have some level of technical skill!

    Okay, first we have a "Path not found" error, and there is a fully-qualified file name whose path couldn't be found. First thing to check is whether the path really exists. From the most recent reply, one can see that the answer is "No, it does not exist." The 2 subdirectory is missing from the Temp directory.

    Okay, so we verified that the error message is valid. The next thing to determine is where the program got this path from. The person already recognized that it was the Temp directory, and it shouldn't be a huge deductive leap to determine that the path probably came from the TEMP or TMP environment variable.

    The observation that the Temp directory is completely empty suggests that the person, in an obsessive-compulsive fit, deleted everything from the Temp directory, including the 2 subdirectory. Too bad that their TEMP environment variable still contained a reference to it.

    As a result, any program that wants to create a temporary file will try to create it in a directory that doesn't exist. Result: "Path not found."

    The fix: Re-create the 2 subdirectory that you mistakenly deleted. (And yes, this fixed the problem.)

    It somehow seemed completely surprising to this person that a "Path not found" error could possibly mean that a path couldn't be found.

  • The Old New Thing

    Why does PrintWindow hate CS_PARENTDC? Because EVERYBODY hates CS_PARENTDC!

    • 16 Comments

    Commenter kero wants to know why the Print­Window function hates CS_PARENT­DC. (And CS_CLASS­DC, and CS_OWN­DC.)

    Because everybody hates CS_PARENT­DC! (And CS_CLASS­DC, and CS_OWN­DC.)

    We saw earlier that these class styles violate widely-held assumptions about how drawing works. I mean, who would have thought that asking for two device contexts would give you the same one back twice? Or that changes to one device context would secretly modify another (because they're really the same)? Or that a window procedure assumes that it will see only one device context ever?

    The Print­Window function is really in a pickle when faced with a window with one of these class styles, because the whole point of Print­Window is to render into some other device context. The Print­Window function says "Render into this other device context, please," and the window acts like a bratty two-year-old who refuses to eat food from anything other than his favorite monkey plate. "This is not my monkey plate. I will now throw a tantrum."

    The Print­Window function passes a custom device context as a parameter to the WM_PRINT message, and the window says, "Hey, no fair! My class styles say that you aren't allowed to pass any old device context; you have to pass mine. I will now take my broccoli and mash it all over the table."

    Oh well. At least it tried.

    Yet another reason to avoid these weird class styles.

  • The Old New Thing

    Globalization quiz: In honor of, well, that's part of the quiz

    • 46 Comments
    The Corporate Citizenship Tools; Microsoft Local Language Program Web site contains a map of the world, coded by region. There was a bug on the map. See if you can spot it:

      Asia
      Europe
      Middle East & Africa
      North America
      South America
      South Pacific

    After I pointed out the error, they fixed the map on their Web page, so no fair clicking through to the Local Language Program Web page and comparing the pictures!

    Non-useful hint: I chose the publication date of this quiz in honor of the answer.

    Bonus chatter: Inside the answer.

  • The Old New Thing

    How did my hard drive turn into a TARDIS?

    • 44 Comments

    A customer observed that the entry for a network drive looked liked this in My Computer, well, except that there was a network drive icon instead of ASCII art.

    O Public (\\server) (S:)
     
     
    3.81TB free of 2.5GB

    How is it possible for a 2.5GB drive to have 3.81TB free?

    While there have certainly been examples of Explorer showing confusing values the reason for the strange results was, at least this time, not Explorer's fault.

    This particular network drive actually reported (via Get­Disk­Free­Space­Ex) that it had more free space than drive space. Explorer is dutifully reporting the information it was given, because it doesn't try to second-guess the file system. If a network drive wants to report that it is a TARDIS, then it's a TARDIS.

  • The Old New Thing

    When embedding a dialog inside another, make sure you don't accidentally create duplicate control IDs

    • 5 Comments

    The WS_EX_CONTROL­PARENT extended style (known in dialog templates as DS_CONTROL) instructs the dialog manager that the dialog's children should be promoted into the dialog's parent. This is easier to explain in pictures than in text. Given the following window hierarchy:

                           
      hdlgMain  
           
             
    A   B
    WS_EX_CON-
    TROLPARENT
      C   D
           
         
      B1   B2  

    The result of the WS_EX_CONTROL­PARENT extended style being set is that the children of B are treated as if they were direct children of the main dialog, and the window B itself logically vanishes.

                               
      hdlgMain  
           
               
    A   B1   B2   C   D

    The WS_EX_CONTROL­PARENT extended style means "Hello, I am not really a dialog control. I am a control parent. In other words, I have children, and those children are controls." (Some people erroneously put the WS_EX_CONTROL­PARENT extended style on the main dialog itself. That's wrong, but fortunately it also has no effect.)

    Okay, this is all stuff you already know. So why am I bringing up this topic? I sort of gave it away in the subject line: When you use WS_EX_CONTROL­PARENT, you need to be careful that the controls that you are promoting into the parent don't conflict with controls already in the parent, or with controls promoted into the parent by another sibling.

    Suppose, in the above scenario, that window C also had the WS_EX_CONTROL­PARENT extended style, and it had children C1 and C2. Not only do you have to watch out that B1 and B2 don't conflict with the controls A or D, you also have to watch out that it doesn't conflict with C1 or C2 either.

    "But Mister Wizard, the property sheet control hosts multiple child dialogs, and since they can be provided by third parties, it's entirely possible (and likely) that there will be conflicts between B1 and, say, C2. Why doesn't this create a problem?"

    Well, Timmy, most of the time, it doesn't, because notifications are fired to the control's parent, and in the case of child dialogs, the child dialog's child controls fire their notifications to the child dialog. So as long as the identifiers are unique within the child dialog, you won't have a problem. This isn't the entire answer, however, but to understand it further, we'll need to look at another consequence of control ID conflicts, which we'll take up next time.

Page 1 of 3 (25 items) 123