• The Old New Thing

    Modality, part 9: Setting the correct owner for modal UI, practical exam


    Here's a question that came from a customer. You can answer it yourself based on what you know about modal UI. (If you're kind of rusty on the topic, you can catch up here.) I've left in some irrelevant details just to make things interesting.

    Our program launches a helper program to display an Aero Wizard to guide the user through submitting a service request.


    There are no EnableWindow calls leading up to or returning from this call. the DoModal handles things nicely.

    When the user clicks "Cancel" to cancel the service request, we use a TaskDialog so we can get the new look for our confirmation message box. The task dialog setup goes like this:

    TASKDIALOGCONFIG config = { 0 };
    config.cbSize = sizeof(config);
    config.hwndParent = hwndMainFrame;

    When the user clicks "Yes" to cancel, then another window instead of our frame becomes active.

    On a hunch, I replaced the task dialog with a Win32 message box

    MessageBox(hwndMainFrame, ...);

    and bingo, we get the correct behavior. When our wizard exits, the main frame receives focus.

    I believe that the "automatic" modal behavior that comes with DoModal() that takes care of disabling and reenabling the main frame is somehow getting short-circuited by using TaskDialog from inside our PSM_QUERYCANCEL message handler.

    Right now, we've switched to MessageBox, but we would much prefer to use the task dialog.

    Although it's not common, it is legal to have a window's parent or owner belongs to another thread or process. But it definitely makes things a bit more tricky to manage because it attaches the input queues of the two threads, and you now have two threads coöperating to manage a single window hierarchy.

    Is the cross-process window hierarchy a contributing factor to the problem, or is it just a red herring?

  • The Old New Thing

    How to turn off the exception handler that COM "helpfully" wraps around your server


    Historically, COM placed a giant try/except around your server's methods. If your server encountered what would normally be an unhandled exception, the giant try/except would catch it and turn it into the error RPC_E_SERVERFAULT. It then marked the exception as handled, so that the server remained running, thereby "improving robustness by keeping the server running even when it encountered a problem."

    Mind you, this was actually a disservice.

    The fact that an unhandled exception occurred means that the server was in an unexpected state. By catching the exception and saying, "Don't worry, it's all good," you end up leaving a corrupted server running. For example:

    HRESULT CServer::DoOneWork(...)
     CWork *pwork = m_listWorkPending.RemoveFirst();
     if (pwork) {
     return S_OK;

    Suppose there's a bug somewhere that causes pwork->Reverse­Polarity() to crash. Maybe the problem is that the neutrons aren't flowing, so there's no polarity to reverse. Maybe the polarizer is not property initialized. Whatever, doesn't matter what the problem is, just assume there's a bug that prevents it from working.

    With the global try/except, COM catches the exception and returns RPC_E_SERVERFAULT back to the caller. Your server remains up and running, ready for another request. Mind you, your server is also corrupted. The widget never got unfrobbed, the timestamp refers to work that never completed, and the CWork that you removed from the pending work list got leaked.

    But, hey, your server stayed up.

    A few hours later, the server starts returning E_OUTOFMEMORY errors (because of all the leaked work items), you get errors because there are too many outstanding frobs, and the client hangs because it's waiting for a completion notification on that work item that you lost track of. You debug the server to see why everything is so screwed up, but you can't find anything wrong. "I don't understand why we are leaking frobs. Every time we frob a widget, there's a call to unfrob right after it!"

    You eventually throw up your hands in resignation. "I can't figure it out. There's no way we can be leaking frobs."

    Even worse, the inconsistent object state can be a security hole. An attacker tricks you into reversing the polarity of a nonexistent neutron flow, which causes you to leave the widget frobbed by mistake. Bingo, frobbing a widget makes it temporarily exempt from unauthorized polarity changes, and now the bad guys can change the polarity at will. Now you have to chase a security vulnerability where widgets are being left frobbed, and you still can't find it.

    Catching all exceptions and letting the process continue running assumes that a server can recover from an unexpected failure. But this is absurd. You already know that the server is unrecoverably toast: It crashed!

    Much better is to let the server crash so that the crash dump can be captured at the point of the failure. Now you have a fighting chance of figuring out what's going on.

    But how do you turn off that massive try/except? You didn't put it in your code; COM created it for you.

    Enter IGlobal­Options: Set the COMGLB_EXCEPTION_HANDLING property to COMGLB_EXCEPTION_DONOT_HANDLE, which means "Please don't try to 'help' me by catching all exceptions. If a fatal exception occurs in my code, then go ahead and let the process crash." In Windows 7, you can ask for the even stronger COMGLB_EXCEPTION_DONOT_HANDLE_ANY, which means "Don't even try to catch 'nonfatal' exceptions."

    Wait, what's a 'fatal' exception?

    A 'fatal' exception, at least as COM interprets it, is an exception like STATUS_ACCESS_VIOLATION or STATUS_ILLEGAL_INSTRUCTION. (A complete list is in this sample Rpc exception filter.) On the other hand a 'nonfatal' exception is something like a C++ exception or a CLR exception. You probably want an unhandled C++ or CLR exception to crash your server, too; after all, it would have crashed your program if it weren't running as a server. Therefore, my personal recommendation is to use COMGLB_EXCEPTION_DONOT_HANDLE_ANY whenever possible.

    "That's great, but why is the default behavior the dangerous 'silently swallow exceptions' mode?"

    The COM folks have made numerous attempts to change the default from the dangerous mode to one of the safer modes, but the application compatibility consequences have always been too great. Turns out there are a lot of servers that actually rely on COM silently masking their exceptions.

    But at least now you won't be one of them.

  • The Old New Thing

    Why didn't they use the Space Shuttle to rescue the Apollo 13 astronauts?


    Many decisions make sense only in the context of history.

    Much like the moviegoers who were puzzled why NASA didn't just use the Space Shuttle to rescue the Apollo 13 astronauts, computer users of today, when looking back on historical decisions, often make assumptions based on technology that didn't exist.

    Consider, for example, pointing out that the absence of a console subsystem in Windows 3.1 was no excuse for not porting the ipconfig program as a character-mode application. "Sure maybe you didn't have a console subsystem, but why not just use the DOS box?"

    The MS-DOS prompt is a virtual machine running a copy of MS-DOS. Since it's a virtual machine, as far as the MS-DOS prompt is concerned, it's just running all by its happy self on a dedicated computer running MS-DOS. In reality, of course, it's running inside a simulator being controlled by Windows, but the point of the simulation is so that old applications can continue to run even though they think they're running under MS-DOS.

    "There wasn't any security in place with Win 3.1, so any program run from a DOS box should have been able to affect anything on the system."

    Since the MS-DOS prompt ran in a virtual machine, everything it did was under the supervision of the virtual machine manager. If it tried to access memory it didn't have permission to access, an exception would be raised and handled by the virtual machine manager. If it tried to execute a privileged instruction, an exception would be raised, and the virtual machine manager would step in with a "Nope, I'm not going to let you do that" and terminate the virtual machine. In a sense, programs running in the MS-DOS prompt actually ran with more protection and isolation than Windows applications running on the desktop, because Windows created a whole separate universe for each MS-DOS prompt.

    One of the consequences of virtualization is that programs running in the MS-DOS prompt are plain old MS-DOS applications, not Windows applications. There is no Windows API in MS-DOS, so there is no Windows API in the MS-DOS prompt either. (It's like running Windows inside a virtual machine on your Linux box and wondering why your Windows program can't call XCreateWindow. It can't call XCreateWindow because that's a function on the host system, not in the virtual machine.)

    Okay, but let's suppose, just for the sake of argument, that somebody poked a hole in the virtual machine and provided a way for MS-DOS programs to call WinSock APIs.

    You still wouldn't want ipconfig to be an MS-DOS program.

    Recall that Windows 3.1 ran in one of two modes, either standard mode or enhanced mode. Standard mode is the version designed for the 80286 processor. It didn't have virtual memory or support for virtual machines. When you ran an MS-DOS prompt, standard mode Windows would freeze all your Windows programs and effectively put itself into suspended animation. It then ran your MS-DOS program (full-screen since there was no Windows around to put it in a window), and when your MS-DOS program exited, Windows would rouse itself from its slumber and bring things back to the way they were before you opened that MS-DOS prompt.

    It would kind of suck if getting your computer's IP address meant stopping all your work, shutting down Windows (effectively), and switching the video adapter into character mode, just so it could print 16 characters to the screen.

    "Well, who cares about standard mode Windows any more? Let's say that it only works in enhanced mode. Enhanced mode can multi-task MS-DOS prompts and run them in a window."

    Recall that the minimum memory requirements for Windows 3.1 in enhanced mode was 1664KB of memory. Given that each MS-DOS box took up about 1MB of memory, you're saying that displaying 16 characters of information is going to consume over half of your computer's memory?

    "Okay, helpdesk wants to know my IP address so they can troubleshoot my computer. In order to do that, I have to run this program, but first I need to save all my work and exit all my programs in order to free up enough memory to run the program they want me to run."

    Better to just write a simple Windows application.

    Bonus commentary: 640k asks, "Why wasn't winipcfg called ipconfig?"

    Right. "Let's have two completely different and incompatible programs with the same name." See how far you get with that.

  • The Old New Thing

    Don't just stand around saying somebody should do something: Be someone


    On one of the frivolous mailing lists in the Windows project, somebody spotted some behavior that seemed pretty bad and filed a bug on it. The project was winding down, with fewer and fewer bugs being accepted by the release management team each day, so it was not entirely surprising that this particular bug was also rejected. News of this smackdown threw the mailing list into an fit of apoplexy.

    Don't they realize how bad this bug is? Somebody should reactivate this bug.
    Yeah, this is really serious. I don't think they understood the scope of the problem. Somebody should mark the bug for reconsideration.
    Definitely. Someone should reactivate the bug and reassert the issue.

    After about a half dozen of messages like this, I couldn't take it any longer.

    I can't believe I'm reading this.

    I decided to Be Someone. I reactivated the bug and included a justification.

    Don't just stand around saying somebody should do something. Be that someone.

    (And even though it's not relevant to the story, the bug was ultimately accepted by the release management team the second time around because all the discussion of the bug gave the bug representative more information with which to to argue why the bug should be fixed.)

  • The Old New Thing

    Was showing the column header in all Explorer views a rogue feature?


    User :( asks whether the Explorer feature that shows the column headers in all views was a rogue feature or a planned one.

    If it was a rogue feature, it was a horribly badly hidden one.

    One of the important characteristics of the rogue feature is that you not be able to tell that the feature is there unless you go looking for it. If the feature is right there on the screen as soon as you open an Explorer window, odds are that somebody is going to notice and say something about it. (For example, the designer who is responsible for Explorer is probably going to notice that every screenshot of Explorer doesn't match the spec.)

    That's why rogue features typically take the form of a hotkey or holding a modifier key in conjunction with another operation.

    Writing up this characterization of rogue features reminds me that I myself am responsible for a rogue feature of Windows 95. If you go to an MS-DOS box, select Edit, then Mark, then select a section of the window for copying, if you hold the Ctrl key while dragging the mouse, the shape of the selection changes from a box to, um, I'm not sure the shape has a name. But it includes all the text between the start point and the end point, as if the contents of the window had come from an edit control. Something like this:


    Since this was a rogue feature, it was never tested, and I suspect that it didn't work on Hebrew or Arabic systems.

    I must've chickened out, or maybe my rogue feature was found out, because the code for streamed selections was ifdef'd out before Windows 95 shipped. So at least I can still honestly say that I never shipped a rogue feature.

  • The Old New Thing

    What's the difference between an asynchronous PIPE_WAIT pipe and a PIPE_NOWAIT pipe?


    When you operate on named pipes, you have a choice of opening them in PIPE_WAIT mode or PIPE_NOWAIT mode. When you read from a PIPE_WAIT pipe, the read blocks until data becomes available in the pipe. When you read from a PIPE_NOWAIT pipe, then the read completes immediately even if there is no data in the pipe. But how is this different from a PIPE_WAIT pipe opened in asynchronous mode by passing FILE_FLAG_OVERLAPPED?

    The difference is in when the I/O is deemed to have completed.

    When you issue an overlapped read against a PIPE_WAIT pipe, the call to Read­File returns immediately, but the completion actions do not occur until there is data available in the pipe. (Completion actions are things like setting the event, running the completion routine, or queueing a completion to an I/O completion port.) On the other hand, when you issue a read against a PIPE_NOWAIT pipe, the call to Read­File returns immediately with completion—if the pipe is empty, the read completes with a read of zero bytes and the error ERROR_NO_DATA.

    Here's a timeline, for people who prefer tables.

    Event Asynchronous PIPE_WAIT PIPE_NOWAIT
    pipe initially empty
    ReadFile Returns immediately with ERROR_IO_PENDING Returns immediately with ERROR_NO_DATA
    I/O completes with 0 bytes
    time passes
    Data available I/O completes with n > 0 bytes

    If you use the PIPE_NOWAIT flag, then the only way to know whether there is data is to poll for it. There is no way to be notified when data becomes available.

    As the documentation notes, PIPE_NOWAIT remains solely for compatibility with LAN Manager 2.0. Since the only way to use pipes created as PIPE_NOWAIT is to poll them, this is obviously not a recommended model for a multitasking operating system.

  • The Old New Thing

    The MARGINS parameter to the DwmExtendFrameIntoClientArea function controls how far the frame extends into the client area


    A customer wrote a program that calls Dwm­Extend­Frame­Into­Client­Area to extend the frame over the entire client area, but then discovered that this made programming difficult:

    I have a window which I want to have a glassy border but an opaque body. I made my entire window transparent by calling Dwm­Extend­Frame­Into­Client­Area, and I understand that this means that I am now responsible for managing the alpha channel when drawing so that the body of my window remains opaque while the glassy border is transparent. Since most GDI functions are not alpha-aware, this management is frustrating. Is there a better way? In pictures, I only want the red portion of the diagram below to be on glass; the inside yellow part should be opaque like normal. Is there an API that can do this?

    This customer's excitement about the glass frame is like somebody who buys a pallet of tangerine juice even though he only wanted two glasses. And now he has questions about how to store the rest of the tangerine juice he didn't want.

    This customer, it appears, passed −1 as the MARGINS to Dwm­Extend­Frame­Into­Client­Area which means "Bring it on, baby! Give me all tangerine all the time everywhere!" If you only want the glass to extend into part of your client area, then say so. Set the MARGINS to the thickness of the glass border (the thickness of the red portion of the above diagram).

  • The Old New Thing

    My, what strange NOPs you have!


    While cleaning up my office, I ran across some old documents which reminded me that there are a lot of weird NOP instructions in Windows 95.

    Certain early versions of the 80386 processor (manufactured prior to 1987) are known as B1 stepping chips. These early versions of the 80386 had some obscure bugs that affected Windows. For example, if the instruction following a string operation (such as movs) uses opposite-sized addresses from that in the string instruction (for example, if you performed a movs es:[edi], ds:[esi] followed by a mov ax, [bx]) or if the following instruction accessed an opposite-sized stack (for example, if you performed a movs es:[edi], ds:[esi] on a 16-bit stack, and the next instruction was a push), then the movs instruction would not operate correctly. There were quite a few of these tiny little "if all the stars line up exactly right" chip bugs.

    Most of the chip bugs only affected mixed 32-bit and 16-bit operations, so if you were running pure 16-bit code or pure 32-bit code, you were unlikely to encounter any of them. And since Windows 3.1 did very little mixed-bitness programming (user-mode code was all-16-bit and kernel-mode code was all-32-bit), these defects didn't really affect Windows 3.1.

    Windows 95, on the other hand, contained a lot of mixed-bitness code since it was the transitional operating system that brought people using Windows out of the 16-bit world into the 32-bit world. As a result, code sequences that tripped over these little chip bugs turned up not infrequently.

    An executive decision had to be made whether to continue supporting these old chips or whether to abandon them. A preliminary market analysis of potential customers showed that there were enough computers running old 80386 chips to be worth making the extra effort to support them.

    Everybody who wrote assembly language code was alerted to the various code sequences that would cause problems on a B1 stepping, so that they wouldn't generate those code sequences themselves, and so they could be on the lookout for existing code that might have problems. To supplement the manual scan, I wrote a program that studied all the Windows 95 binaries trying to find these troublesome code sequences. When it brought one to my attention, I studied the offending code, and if I agreed with the program's assessment, I notified the developer who was responsible for the component in question.

    In nearly all cases, the troublesome code sequences could be fixed by judicious insertion of NOP statements. If the problem was caused by "instruction of type X followed by instruction of type Y", then you can just insert a NOP between the two instructions to "break up the party" and sidestep the problem. Sometimes, the standard NOP would end up classified as an instruction of type Y, so you had to insert a special kind of NOP, one that was not of type Y.

    For example, here's one code sequence from a function which does color format conversion:

            push    si          ; borrow si temporarily
            ; build second 4 pixels
            movzx   si, bl
            mov     ax, redTable[si]
            movzx   si, cl
            or      ax, blueTable[si]
            movzx   si, dl
            or      ax, greenTable[si]
            shl     eax, 16     ; move pixels to high word
            ; build first 4 pixels
            movzx   si, bh
            mov     ax, redTable[si]
            movzx   si, ch
            or      ax, blueTable[si]
            movzx   si, dh
            or      ax, greenTable[si]
            pop     si
            stosd   es:[edi]    ; store 8 pixels
            db      67h, 90h    ; 32-bit NOP fixes stos (B1 stepping)
            dec     wXE

    Note that we couldn't use just any old NOP; we had to use a NOP with a 32-bit address override prefix. That's right, this isn't just a regular NOP; this is a 32-bit NOP.

    From a B1 stepping-readiness standpoint, the folks who wrote in C had a little of the good news/bad news thing going. The good news is that the compiler did the code generation and you didn't need to worry about it. The bad news is that you also were dependent on the compiler writers to have taught their code generator how to avoid these B1 stepping pitfalls, and some of them were quite subtle. (For example, there was one bug that manifested itself in incorrect instruction decoding if a conditional branch instruction had just the right sequence of taken/not-taken history, and the branch instruction was followed immediately by a selector load, and one of the first two instructions at the destination of the branch was itself a jump, call, or return. The easy workaround: Insert a NOP between the branch and the selector load.)

    On the other hand, some quirks of the B1 stepping were easy to sidestep. For example, the B1 stepping did not support virtual memory in the first 64KB of memory. Fine, don't use virtual memory there. If virtual memory was enabled, if a certain race condition was encountered inside the hardware prefetch, and if you executed a floating point coprocessor instruction that accessed memory at an address in the range 0x800000F8 through 0x800000FF, then the CPU would end up reading from addresses 0x000000F8 through 0x0000000FF instead. This one was easy to work around: Never allocate valid memory at 0x80000xxx. Another reason for the no man's land in the address space near the 2GB boundary.

    I happened to have an old computer with a B1 stepping in my office. It ran slowly, but it did run. I think the test team "re-appropriated" the computer for their labs so they could verify that Windows 95 still ran correctly on a computer with a B1 stepping CPU.

    Late in the product cycle (after Final Beta), upper management reversed their earlier decision and decide not to support the B1 chip after all. Maybe the testers were finding too many bugs where other subtle B1 stepping bugs were being triggered. Maybe the cost of having to keep an eye on all the source code (and training/retraining all the developers to be aware of B1 issues) exceeded the benefit of supporting a shrinking customer base. For whatever reason, B1 stepping support was pulled, and customers with one of these older chips got an error message when they tried to install Windows 95. And just to make it easier for the product support people to recognize this failure, the error code for the error message was Error B1.

  • The Old New Thing

    The message text limit for the Marquee screen saver is 255, even if you bypass the dialog box that prevents you from entering more than 255 characters


    If you find an old Windows XP machine and fire up the configuration dialog for the Marquee screen saver, you'll see that the text field for entering the message won't let you type more than 255 characters. That's because the Marquee screen saver uses a 255-character buffer to hold the message, and the dialog box figure there's no point in letting you type in a message longer than the screen saver can display.

    A customer decided to bypass the configuration dialog and change the text in the screen saver by editing the settings directly, and then complained that the Marquee screen saver truncated the message at 255 characters.

    Well, yeah, because the limit is 255 characters. That's what the dialog box was trying to tell you. If you bypass the dialog box and whack the setting directly, that doesn't change the Marquee screen saver. Its limit is still 255 characters.

    It's like attaching a gizmo to the gas pump fuel nozzle to disable the auto-stop feature because you want to put 20 gallons of gas into your 15-gallon tank. Disabling the auto-stop will not make the gas tank any bigger. All it does it make it easier to accidentally overflow your gas tank's buffer.

  • The Old New Thing

    Why does pasting a string containing an illegal filename character into a rename edit box delete the characters from the clipboard, too?


    Ane asks why, if you have a string with an illegal filename character on the clipboard, and you paste that string into a rename edit box, do the illegal characters get deleted not just from the edit box but also the clipboard?

    Basically, it's a bug, the result of a poor choice of default in an internal helper class.

    There is an internal helper class for "monitoring an edit control" with options to do things like remove illegal characters. This helper class was written back in 1998, presumably with the intention of being used somewhere, but it never did get hooked up. Maybe the feature it was originally written for got cancelled, I can't quite tell. At any rate, this helper class had many options, one of which was "When pasting text containing illegal characters, should I filter the illegal characters from the clipboard, too?", and for some reason it defaulted to Yes. (I can see why the default was Yes from a coding standpoint. It was actually less work to filter the characters from the clipboard that it was to preserve them, but it's a bad default from an API design standpoint.)

    Anyway, this helper class sat unused for a few years, but in 2000, Explorer decided to use this helper class so it could filter illegal characters out of file names when you used the Rename command. The code that uses this helper class chose which options it wanted, and probably due to oversight, the "preserve clipboard contents when pasting" flag was not specified.

    So yeah, it's just a bug. But then again, it's a bug that's been around for over a decade, so who knows if there's somebody out there that relies on it.

Page 126 of 439 (4,383 items) «124125126127128»