November, 2011

  • The Old New Thing

    Stupid Raymond talent: Screaming carrier

    • 109 Comments

    Similar to Mike, I was able to scream (not whistle: scream) a 300 baud carrier tone. This skill proved useful when I was in college and the mainframe system was down. Instead of sitting around waiting for the system to come back, I just went about my regular business around campus. Every so often, I would go to a nearby campus phone (like a free public phone but it can only make calls to other locations on campus), dial the 300 baud dial-up number, and scream the carrier tone. If I got a response, that meant that the mainframe was back online and I should wrap up what I was doing and head back to the lab.

    Mind you, this skill isn't very useful nowadays.

    What stupid computer talent do you have?

  • The Old New Thing

    We've traced the call and it's coming from inside the house: A function call that always fails

    • 64 Comments

    A customer reported that they had a problem with a particular function added in Windows 7. The tricky bit was that the function was used only on very high-end hardware, not the sort of thing your average developer has lying around.

    GROUP_AFFINITY GroupAffinity;
    ... code that initializes the GroupAffinity structure ...
    if (!SetThreadGroupAffinity(hThread, &GrouAffinity, NULL));
    {
     printf("SetThreadGroupAffinity failed: %d\n", GetLastError());
     return FALSE;
    }
    

    The customer reported that the function always failed with error 122 (ERROR_INSUFFICIENT_BUFFER) even though the buffer seems perfectly valid.

    Since most of us don't have machines with more than 64 processors, we couldn't run the code on our own machines to see what happens. People asked some clarifying questions, like whether this code is compiled 32-bit or 64-bit (thinking that maybe there is an issue with the emulation layer), until somebody noticed that there was a stray semicolon at the end of the if statement.

    The customer was naturally embarrassed, but was gracious enough to admit that, yup, removing the semicolon fixed the problem.

    This reminds me of an incident many years ago. I was having a horrible time debugging a simple loop. It looked like the compiler was on drugs and was simply ignoring my loop conditions and always dropping out of the loop. At wit's end, I asked a colleague to come to my office and serve as a second set of eyes. I talked him through the code as I single-stepped:

    "Okay, so we set up the loop here..."

    NODE pn = GetActiveNode();
    

    "And we enter the loop, continuing while the node still needs processing."

    if (pn->NeedsProcessing())
    {
    

    "Okay, we entered the loop. Now we realign the skew rods on the node."

     pn->RealignSkewRods();
    

    "If the treadle is splayed, we need to calibrate the node against it."

     if (IsSplayed()) pn->Recalibrate(this);
    

    "And then we loop back to see if there is more work to be done on this node."

    }
    

    "But look, even though the node needs processing «view node members», we don't loop back. We just drop out of the loop. What's going on?"

    Um, that's an if statement up there, not a while statement.

    A moment of silence while I process this piece of information.

    "All right then, sorry to bother you, hey, how about that sporting event last night, huh?"

  • The Old New Thing

    The Control Panel search results understand common misspellings, too

    • 41 Comments

    Here's a little trick. Open your Start menu and type scrensaver into the search box. That's right, spell it with only one e.

    Hey, it still found the Control Panel options for managing your screen saver.

    If you enable Improve my search results by using online Help in Windows Help and Support, this sends your search query to a back-end server to see if there's updated online help content related to your search. And the people who develop the online help content look over those queries to see if, for example, there is a category of issues people are searching for help with and not finding anything.

    It also means that they get to see a lot of misspellings. And that information is useful, because it means that the Control Panel search provider can be given a table of the most popular misspellings and how to fix them. which in turn benefits people who unchecked the Improve my search results by using online Help (say because they don't have Internet access or because they are afraid of the black helicopters).

    So now, when you search for scrensaver, you are directed to information on screen savers, thanks to the millions of other people who spelled it wrong before you.

    Obligatory link: The so-called God mode.

  • The Old New Thing

    The life story of the SwitchToThisWindow function

    • 36 Comments

    Commenters Mick and Nick (you guys ever considered teaming up and forming a morning radio show?) are interested in the life story of the Switch­To­This­Window function.

    The Switch­To­This­Window was originally added in enhanced mode Windows 3.0 in order to support switching out of fullscreen MS-DOS sessions. Recall that enhanced mode Windows 3.0 was actually three operating systems in one: There was a 32-bit virtual machine manager, and inside one virtual machine ran a copy of standard-mode Windows,¹ and inside all the others ran a copy of MS-DOS. This mean that when you pressed a key when in an MS-DOS session, the keyboard interrupt went to the MS-DOS program and not to Windows.

    When you pressed Alt+Tab, some crazy magic had to happen. The virtual machine manager had to "un-press" the Alt key in the MS-DOS program, then synchronize the shift states of the Windows virtual machine to match the one from the MS-DOS virtual machine. (For example, if you had the shift key down in the MS-DOS virtual machine, it had to simulate pressing the shift key in the Windows virtual machine so they two shift states were back in sync.) And then it could simulate pressing the Tab key, at which point the Windows virtual machine would see the Alt+Tab sequence and put up the Alt+Tab interface.

    That's how things worked if you were running in a windowed MS-DOS session. But if you were in a fullscreen MS-DOS session, things worked differently. Switching back to Windows would mean a display mode reset (which can take a second or longer), and then all the applications on your desktop had to redraw themselves (and probably paging quite a bit in order to do so). This definitely failed to meet the responsiveness people expected from Alt+Tab, so the virtual machine manager pulled a trick: If you pressed Alt+Tab while in a fullscreen MS-DOS session, then instead of switching back to the Windows virtual machine, the virtual machine manager displayed a text-mode version of the Alt+Tab interface.

    I will stop to let the craziness of that sink in: The virtual machine manager had its own Alt+Tab interface built out of text mode.

    Anyway, when you finally released the Alt key and completed the Alt+Tab sequence, the virtual machine manager needed to tell Windows, "Hey, like, pretend that an Alt+Tab thingie just happened, okay?"

    That is what the Switch­To­This­Window function was for. It was the function the virtual machine manager called to tell Windows to switch to a window as if the user had selected it via Alt+Tab (because that is, in fact, what the user did, just via the text-mode interface rather than the graphical one).

    A similar thing happened if you pressed Alt+Esc (or Alt+Shift+Esc in a fullscreen MS-DOS session. That's why there's a second parameter to indicate whether the switch should be done "in the style of Alt+Tab" or "in the style of Alt+Esc."

    The function was undocumented because it existed only for the virtual machine manager to call in order to coordinate its actions with Windows user interface so that you had one big happy Alt+Tab family.

    The text-mode Alt+Tab interface disappeared in Windows 95, but the Switch­To­This­Window function hung around because it wasn't causing anybody any harm, and there was at the time no formal process in place to deprecate and eventually remove an API, not even an internal undocumented one.

    In the Windows XP SP1 timeframe, a bunch of lawyers decided that some functions in Windows needed to be documented. The precise rules for determining which functions needed to be documented and which didn't need to be documented were rather complicated. (Some people applied an algorithm different from the ones those lawyers used and came up with a list of functions that are "missing", when all that they really came up with is a list of functions different from the list those lawyers came up with.)²

    Anyway, the Switch­To­This­Window function got caught in the dragnet, so it got documented. Mind you, like it says right at the top of the documentation, there is no guarantee that the function will continue to exist; it can vanish at any time. Although there is documentation, it has the logical status of an internal function, and internal functions have a tendency to change or vanish entirely. Perhaps someday a new chapter will be added to the life story of Switch­To­This­Window: "The Switch­To­This­Window was removed in Windows Q" for some value of Q.

    Footnotes

    ¹ Not true, but true enough. Don't make me bring back the Nitpicker's Corner.

    ² I will delete any comments on the subject of the algorithm by which those lawyers determined which functions needed to be documented, or on the documentation itself.

    Bonus chatter: As far as I can determine, Switch­To­This­Window just does a Set­Foreground­Window on the window you're switching to, possibly posting it a WM_SYS­COMMAND/SC_RESTORE message, and moving the previous foreground window to the bottom of the Z-order if switched via Alt+Esc. It doesn't provide any special secret sauce for bypassing the normal foreground activation rules. The process that calls Switch­To­This­Window still requires foreground-change permission.

  • The Old New Thing

    How can I extend the deadline for responding to the PBT_APMSUSPEND message?

    • 33 Comments

    A customer observed that starting in Windows Vista, the deadline for responding to the PBT_APMSUSPEND message was reduced from twenty seconds to two seconds. Their program needs more than two seconds to prepare for a suspend and they wanted to know how they could extend the deadline. The program communicates with a device, and if they don't properly prepare the device for suspend, it has problems when the system resumes.

    No, you cannot extend the deadline for responding to the PBT_APMSUSPEND message. The two second deadline is hard-coded; it is not configurable.

    The whole point of reducing the deadline from twenty to two seconds is to ensure that when users press the Sleep button on their computers, the computer actually goes to sleep reasonably promptly. If there were a way to extend the deadline, then you're just reintroducing the bad situation in Windows XP where the user hits the Sleep button on the laptop, but the laptop just sits there taunting you. Meanwhile, the flight attendant is standing there getting angrier at you for not putting your laptop away. (Or worse: You tell the laptop to Sleep and toss it into your bag, and then when you reach your destination, you discover that your laptop is really warm, and the battery is dead.)

    Besides, imagine if ten apps did this. Your app asks for ten extra seconds, another app asks for another twenty seconds, next thing you know all the deadline extensions add up to five minutes.

    The real solution is to fix the driver for this device so it can prepare the device properly when it is told that the machine is about to go into a low power state. (The two-second limit applies to applications but not drivers. At least not yet.)

  • The Old New Thing

    If you protect a write with a critical section, you may also want to protect the read

    • 30 Comments

    It is common to have a critical section which protects against concurrent writes to a variable or collection of variables. But if you protect a write with a critical section, you may also want to protect the read, because the read might race against the write.

    Adam Rosenfield shared his experience with this issue in a comment from a few years back. I'll reproduce the example here in part to save you the trouble of clicking, but also to make this entry look longer and consequently make it seem like I'm actually doing some work (when in fact Adam did nearly all of the work):

    class X {
     volatile int mState;
     CRITICAL_SECTION mCrit;
     HANDLE mEvent;
    };
    
    X::Wait() {
     while(mState != kDone) {
       WaitForSingleObject(mEvent, INFINITE);
     }
    }
    
    X::~X() {
     DestroyCriticalSection(&mCrit);
    }
    
    X::SetState(int state) {
     EnterCriticalSection(&mCrit);
     // do some state logic
     mState = state;
     SetEvent(mEvent);
     LeaveCriticalSection(&mCrit);
    }
    
    Thread1()
    {
     X x;
     ... spawn off thread 2 ...
     x.Wait();
    }
    
    Thread2(X* px)
    {
     ...
     px->SetState(kDone);
     ...
    }
    

    There is a race condition here:

    • Thread 1 calls X::Wait and waits.
    • Thread 2 calls X::SetState.
    • Thread 2 gets pre-empted immediately after calling Set­Event.
    • Thread 1 wakes up from the Wait­For­Single­Object call, notices that mState == kDone, and therefore returns from the X::Wait method.
    • Thread 1 then destructs the X object, which destroys the critical section.
    • Thread 2 finally runs and tries to leave a critical section that has been destroyed.

    The fix was to enclose the read of mState inside a critical section:

    X::Wait() {
     while(1) {
       EnterCriticalSection(&mCrit);
       int state = mState;
       LeaveCriticalSection(&mCrit);
       if(state == kDone)
         break;
       WaitForSingleObject(mEvent, INFINITE);
     }
    }
    

    Forgetting to enclose the read inside a critical section is a common oversight. I've made it myself more than once. You say to yourself, "I don't need a critical section here. I'm just reading a value which can safely be read atomically." But you forget that the critical section isn't just for protecting the write to the variable; it's also to protect all the other actions that take place under the critical section.

    And just to make it so I actually did some work today, I leave you with this puzzle based on an actual customer problem:

    class BufferPool {
    public:
     BufferPool() { ... }
     ~BufferPool() { ... }
    
     Buffer *GetBuffer()
     {
      Buffer *pBuffer = FindFreeBuffer();
      if (pBuffer) {
       pBuffer->mIsFree = false;
      }
      return pBuffer;
     }
    
     void ReturnBuffer(Buffer *pBuffer)
     {
      pBuffer->mIsFree = true;
     }
    
    private:
     Buffer *FindFreeBuffer()
     {
      EnterCriticalSection(&mCrit);
      Buffer *pBuffer = NULL;
      for (int i = 0; i < 8; i++) {
       if (mBuffers[i].mIsFree) {
        pBuffer = &mBuffers[i];
        break;
       }
      }
      LeaveCriticalSection(&mCrit);
      return pBuffer;
     }
    private:
     CRITICAL_SECTION mCrit;
     Buffer mBuffers[8];
    };
    

    The real class was significantly more complicated than this, but I've distilled the problem to its essence.

    The customer added, "I tried declaring mIs­Free as a volatile variable, but that didn't seem to help."

  • The Old New Thing

    How can I tell whether a DLL has been registered?

    • 27 Comments

    A customer pointed out that you can use regsvr32 to register a DLL or to unregister it, but how do you query whether a DLL has been registered?

    DLL registration (via regsvr32) is not declarative; it is procedural. A DLL does not provide a manifest of things it would like to happen when installed. Instead, the DLL merely provides two functions for regsvr32 to call, one for registration (DllRegisterServer) and another for unregistration (DllUnregisterServer). All the regsvr32 function does is call those functions.

    How those functions perform their registration and unregistration is not specified. Most of the time, those functions merely write some registry settings, but the DllRegisterServer is not limited to that. For example, the DllRegisterServer function might write some values only conditionally, say, only if the user is running a specific version of Windows. Or it might back up the old value of a registry key before it overwrites it. It might create or modify files as part of its installation or configure your firewall settings or look for and uninstall previous versions of the same DLL.

    By convention, the DllRegisterServer performs whatever operations are necessary for DLL registration, and the DllUnregisterServer reverses those operations, but since those functions are provided by the DLL, there's no guarantee that that's what actually happens. Who knows, maybe DllRegisterServer formats your hard drive. A DllRegisterServer function might just return S_OK without doing anything. How can you tell whether a function with no side effects has been called?

    Given that DLL registration can encompass arbitrary operations, there is no general-purpose way of determining whether registration has taken place for an arbitrary DLL.

    To determine whether a DLL has been registered, you need to bring in domain-specific knowledge. If you know that a DLL registers a COM object with a particular CLSID, you can check whether that CLSID is indeed registered.

  • The Old New Thing

    Why not use animated GIFs as a lightweight alternative to AVIs in the animation common control?

    • 27 Comments

    Commenter Vilx- wondered why animated GIFs weren't used as the animation format for the shell animation common control. After all, "they are even more lightweight than AVIs."

    Animated GIFs are certainly more lightweight than general AVIs, since AVI is just a container format, so decoding a general AVI means decoding any encoding format invented now or in the future. On the other hand, the shell common control imposed enough limits on the type of AVIs it could handle to the point where what was left was extremely lightweight, certainly much more lightweight than an animated GIF.

    Think about it: To use an animated GIF, you need a GIF decoder. And a GIF decoder is already significantly larger (both in terms of code and memory) than the RLE-8 decoder. Also significantly more complicated, and therefore significantly more likely to have bugs. Whereas RLE-8 is so simple there isn't much that can go wrong, and the RLE-8 decoder had been around since Windows 3.0, so it was already a known quantity. All you have to do to invoke the RLE-8 decoder is call Set­DI­Bits­To­Device. One line of code is hard to beat.

    Windows 95 did not come with a GIF decoder. Remember, Internet Explorer 1.0 did not come with Windows 95; it was part of the Plus! pack. As I recall, at the time Windows 95 released to manufacturing, the Plus! pack was still under development. (And at the time the animation common control was being designed, Internet Explorer didn't exist. Heck, Mosaic didn't exist!) Plus the fact that the common controls were available in both 16-bit and 32-bit versions—in fact it was the 16-bit versions that were written first since Windows 95 didn't have good Win32 support at the start of the project. More accurately, Windows 95 didn't have any Win32 support at the start of the project.

    So I'm kind of amused by the description of GIF as a lightweight animation encoding algorithm. Compared to RLE, the GIF format weighs a ton!

  • The Old New Thing

    How can I tell whether a COM pointer to a remote object is still valid?

    • 27 Comments

    A customer asked the rather suspicious question, "How do I check whether a pointer is valid in another process?"

    This question should make your head boggle with bewilderment. First of all, we've moved beyond Crash­Program­Randomly to Crash­Some­Other­Program­Randomly. Second of all, what the heck are you doing with a pointer in another process? You can't do anything with it!

    After some back-and-forth¹ we manage to tease the real question out of the customer: How can I tell whether a COM pointer to a remote object is still valid?

    The easy answer is "Don't worry. COM will take care of it." Just call the method on the object. If the remote object is not valid, you will get an error back, like RPC_E_DISCONNECTED or RPC_E_SERVER_DIED or RPC_E_SERVER_DIED_DNE or HRESULT_FROM_WIN32(RPC_S_SERVER_UNAVAILABLE). When you get an error like that, you'll know that the remote object is no longer valid, and you can respond accordingly.

    What if you want your program to be a little proactive and prune dead remote objects instead of just noticing that they're dead the net time you want to use them?

    Some people "solve" this problem by performing a Query­Interface on a newly-generated interface ID. Since the IID has never been seen before, COM cannot consult its cache of previously-queried interfaces and must remote the call, at which point the death of the remote object will be detected. (The second rule for implementing Query­Interface exists in part so that COM can optimize Query­Interface of remote objects.) The problem with this technique is that by subverting the cache, you also end up polluting it. Each time you generate a new IID and do a dummy Query­Interface on it, you add another dummy entry to the Query­Interface cache. This wastes memory keeping track of interfaces that nobody will ever ask for again, and may even push out information about interfaces that your program actually uses!

    The COM folks tell me that your program should just accept the fact that the other process can go away at any time. Instead of making some sort of decision based on whether the other process is still there (since a response of "yeah, it's still here" could be wrong by the time you act on it), you should just call the method and accept that it may fail because the other process vanished while you weren't looking.

    Footnote

    ¹ The customer first explained that their server process created an object and gave a pointer to that object to the client. The client then registered a callback object with the server, and the server wanted to check that the client object was still valid before invoking any methods on it. When asked, "Why not just use COM?" the customer replied, "We are using COM. We create the object on the server via Co­Create­Instance, then register the client object via a method on our interface."

    The customer was under the impression that when a COM pointer refers to an object in another process, you just get that pointer from the other process.

    If you think about it, this makes no sense at all. How could any of your method calls work? You call pRemote­Object->AddRef() and the compiler is going to deference the pRemote­Object pointer, and then crash because the pointer would refer to memory in another process. I guess the customer was under the impression that some magic voodoo happens so that the CPU knows that "Oh wait, this pointer really belongs to another process, let me go fetch the memory from that other process. Okay, and now you want to call a function pointer in another process? Okay, um, let me magically merge the two processes together so the remote code running in that other process can access the objects in your process." Or something.

    When you have a COM pointer to an object in another process, the pointer that you have is a proxy which accepts method calls and marshals the call to the real object somewhere else.

  • The Old New Thing

    Why can't I install this DLL via Regsvr32 /i?

    • 26 Comments

    A customer asked for help installing a particular DLL. They ran the command regsvr /i SomeDll.dll but got the error "SomeDll.dll was loaded, but the DllInstall entry point was not found. This file can not be registered."

    A DLL needs to be specifically written to be used with the regsvr32 command. You can't just grab some random DLL and expect regsvr32 to work. As we saw last week, the regsvr32 program merely loads the specified DLL and calls an entry point established by convention. If the DLL was not written to be used with regsvr32 then the conventional entry point will not be found, and you get an error message.

    The /i switch to regsvr32 instructs the program to look for the entry point known as Dll­Install. By convention, the Dll­Install function performs installation and setup of a DLL, but since it's just some function exported by a DLL, it could do anything it wants (or nothing at all).

    You can't just grab a random DLL and expect regsvr32 to do anything meaningful with it. The DLL has to be designed to operate with regsvr32.

    Handing random DLLs to regsvr32 is like dialing a random telephone number, sending a tone at 1170Hz and getting upset when you don't get a 2125Hz tone in response.

    The number one hit for a search on what does regsvr32 do is an old Knowledge Base article which explains what regsvr32 does, and it even contains a sample program which emulates regsvr32 so you can use it to debug your DLL. (The sample program hasn't been updated to support the /i flag, which I leave as an exercise.)

    One day, I received a piece of email from another employee whom I had never met nor had ever heard of. It didn't even begin with an introduction; it just jumped right in as if we'd been friends for years. "I'm trying to debug a problem where regsvr32 cannot register my DLL. It gives the error 'The specified procedure could not be found.' I saw a blog entry written by you and am trying to understand what our problem is."

    This blog thing has backfired. The reasons I write these articles is to get people to stop asking me questions. (The mechanism for that being to give the answer out in public for everyone to see.) Instead, it turns into "Hi, I found an article you wrote about X, which ipso facto makes you not only the world's foremost authority on X, but also the world's leading support technician on X."

    News flash: Posting a blog entry about something on the Internet should not be taken as evidence that the author is an expert on that subject. (One might argue that it in fact is more likely to be the opposite.)

    At the time, I wasn't aware of the knowledge base article that explains what regsvr32 does and how to debug it, so I couldn't point to it. I wrote back, "All regsvr32 does is Load­Library, Get­Proc­Address, and then calls the function. You can write your own test that does the same thing. You do not require any expertise from me."

    Less than an hour later, I received a reply: "Thanks. I figured it out. There was an older version of the DLL in the path ahead of the one I was trying to register."

    And I never did figure out which blog entry I wrote that made them think I was an expert on regsvr32. Maybe the person worked in Microsoft Research and used a prototype of their machine that predicts the future, and used it to predict that I was going to write about regsvr32 two years later.

Page 1 of 3 (23 items) 123