• The Old New Thing

    You can name your car, and you can name your kernel objects, but there is a qualitative difference between the two

    • 30 Comments

    A customer reported that the Wait­For­Single­Object appeared to be unreliable.

    We have two threads, one that waits on an event and the other that signals the event. But we found that sometimes, signaling the event does not wake up the waiting thread. We have to signal it twice. What are the conditions under which Wait­For­Single­Object will ignore a signal?

    // cleanup and error checking elided for expository purposes
    
    void Thread1()
    {
      // Create an auto-reset event, initially unsignaled
      HANDLE eventHandle = CreateEvent(NULL, FALSE, FALSE, TEXT("MyEvent"));
    
      // Kick off the background thread and give it the handle
      CreateThread(..., Thread2, eventHandle, ...);
    
      // Wait for the event to be signaled
      WaitForSingleObject(eventHandle, INFINITE);
    }
    
    DWORD CALLBACK Thread2(void *eventHandle)
    {
     ResetEvent(eventHandle); // start with a clean slate
     DoStuff();
     // All the calls to SetEvent succeed.
     SetEvent(eventHandle); // this does not always wake up Thread1
     SetEvent(eventHandle); // need to add this line
     return 0;
    }

    Remember, you generally shouldn't start with the conspiracy theory. The problem is most likely close to home.

    People offered a variety of theories as to what may be wrong. One possibility is that some other code in the process is calling Reset­Event on the event handle. Another is that some other code in the process has a bug where it is calling Reset­Event on the wrong event handle.

    I asked about the name.

    I have a friend who names her car. Whenever she gets a new car, she agonizes over what to call it. She'll drive it for a few days to see what its personality is and eventually choose a name that suits the vehicle. And thereafter, whenever she refers to her car, she uses the name. (She also assigns the car a gender.)

    If you like naming your car, then that's great. But there's a difference between naming your car and naming your kernel objects. When you give your car a name, that name is just for your private use. On the other hand, if you give your kernel object a name, other people can use that name to access your object. And once they have access to your object, they can do funky things to it, like reset it.

    Imagine if you decided to name your car Clara, and any time somebody shouted, "Clara, where are you?" your car horn honked. I'm assuming your car has voice recognition software. Also that your car has the personality of a puppy. Work with me here.

    Even scarier: Any time somebody shouted, "Clara, open the trunk," your car trunk unlocked.

    That's what happens when you name your kernel objects. Anybody who knows the name (and has appropriate access) can open the object and start doing things to it. Presumably that's why you named your kernel object in the first place: You want this to happen. You gave your object a name specifically to allow other people to come in and access the same object.

    In the above example, I saw that the event had a very generic-sounding name, My­Event. That sounds like the name that some other similarly uncreative application developer might have chosen.

    And indeed, that was the reason. There was another application which was creating an event that coincidentally has the same name, so instead of creating a new object, the kernel returned a handle to the existing one. The other application called Wait­For­Single­Object on the event, and so when the customer's program called Set­Event, it woke the other application instead. So this bug has a double-whammy: Not only does it cause your program to miss a signal, it causes the other program to receive a signal when it wasn't expecting one. Two bugs for the price of one.

    Note that no matter how clever you are at choosing a name for your event, you will always have this problem, because even if you called it Super­Secret­Never­Gonna­Find­It­75, there's a program out there that knows the secret name: Namely your own program! If you run two copies of your program, they will both be manipulating the same Super­Secret­Never­Gonna­Find­It­75, and then you're back where you started. When the first copy of the program calls Set­Event, it may wake up the second copy.

    (This is the same principle behind the conclusion that a single-instance program is its own denial of service.)

    Kernel objects should not be named unless you intend them to be shared, because once you name them, you open yourself to issues like this. If you name a kernel object, it must be because you want another process to access it, not because you think giving it a name is kind of cute.

    I suspect a lot of people give their kernel objects names not because they intend them to be shared, but because they see that the Create­Event function has a lpName parameter, and they think, "Well, I guess giving it a name would be nice. Maybe I can use it for debugging purposes or something," not realizing that giving it name actually introduced a bug. Another possibility is that they see that there is a lpName parameter and think, "Gosh, I must give this event a name."

    Kernel object names are optional. Don't give them a name unless you intend them to be shared.

  • The Old New Thing

    Why does misdetected Unicode text tend to show up as Chinese characters?

    • 34 Comments

    If you take an ASCII string and cast it to Unicode,¹ the results are usually nonsense Chinese. Why does ASCII→Unicode mojibake result in Chinese? Why not Hebrew or French?

    The Latin alphabet in ASCII lives in the range 0x41 through 0x7A. If this gets misinterpreted as UTF-16LE, the resulting characters are of the form U+XXYY where XX and YY are in the range 0x41 through 0x7A. Generously speaking, this means that the results are in the range U+4141 through U+7A7A. This overlaps the following Unicode character ranges:

    • CJK Unified Ideographs Extension A (U+3400 through U+4DBF)
    • Yijing Hexagram Symbols (U+4DC0 through U+4DFF)
    • CJK Unified Ideographs (U+4E00 through U+9FFF)

    But you never see the Yijing hexagram symbols because that would require YY to be in the range 0xC0 through 0xFF, which is not valid ASCII. That leaves only CJK Unified Ideographs of one sort of another.

    That's why ASCII misinterpreted as Unicode tends to result in nonsense Chinese.

    The CJK Unified Ideographs are by far the largest single block of Unicode characters in the BMP, so just by purely probabilistic arguments, a random character in BMP is most likely to be Chinese. If you look at a graphic representation of what languages occupy what parts of the BMP, you'll see that it's a sea of pink (CJK) and red (East Asian), occasionally punctuated by other scripts.

    It just so happens that the placement of the CJK ideographs in the BMP effectively guarantees it.

    Now, ASCII text is not all just Latin letters. There are space and punctuation marks, too, so you may see an occasional character from another Unicode range. But most of the time, it's a Latin letter, which means that most of the time, your mojibake results in Chinese.

    ¹ Remember, in the context of Windows, "Unicode" is generally taken to be shorthand for UTF-16LE.

  • The Old New Thing

    Simulating media controller buttons like Play and Pause

    • 18 Comments

    Today's Little Program simulates pressing the Play/Pause button on your fancy keyboard. This might be useful if you want to write a program that converts some other input (say, gesture detection) into media controller events.

    One way of doing this is to take advantage of the Def­Window­Proc function, since the default behavior for the WM_APP­COMMAND message is to pass the message up the parent chain, and if it still can't find a handler, it hands the message to the shell for global processing.

    Remember, don't fumble around. If you want to send a message to a window, then send a message to a window. Don't broadcast a message to every window in the system (resulting in mass chaos).

    Take the scratch program and make this simple addition:

    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     if (ch == ' ') {
      SendMessage(hwnd, WM_APPCOMMAND, (WPARAM)hwnd,
          MAKELONG(0, FAPPCOMMAND_KEY | APPCOMMAND_MEDIA_PLAY_PAUSE));
     }
    }
    
     HANDLE_MSG(hwnd, WM_CHAR, OnChar);
    

    When you press the space bar in the scratch application, it pretends that you instead pressed the Play/Pause button on your fancy keyboard with no shift modifiers.

    The scratch program doesn't do anything with the key, so it ends up falling through to Def­Window­Proc, which eventually hands the key to the shell and any other registered shell hooks. If you have a program like Windows Media Player which registers for shell events, it will see the notification and pause/resume playback.

    Of course, this assumes that the program you want to talk to listens globally for the keypress. If you want to make the current foreground program respond as if you had pressed the Play/Pause, you can just inject the keypress.

    int __cdecl main(int, char**)
    {
     INPUT inputs[2] = {};
     inputs[0].type = INPUT_KEYBOARD;
     inputs[0].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[0].ki.wScan = 0x22;
     inputs[0].ki.dwFlags = KEYEVENTF_EXTENDEDKEY;
     inputs[1].type = INPUT_KEYBOARD;
     inputs[1].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[1].ki.wScan = 0x22;
     inputs[0].ki.dwFlags = KEYEVENTF_EXTENDEDKEY | KEYEVENTF_KEYUP;
     SendInput(2, inputs, sizeof(INPUT));
     return 0;
    }
    

    Note, however, that since we didn't do anything about the state of modifier keys, if the user happens to have the shift key down at the time you injected the message, the application is going to be told, "Hey, do your play/pause thing, and if you change behavior when the shift key is down, here's your chance."

    But what did you expect from a Little Program?

  • The Old New Thing

    Marshaling won't get in your way if it isn't needed

    • 10 Comments

    I left an exercise at the end of last week's article: "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?"

    COM subscribes to the principle that if no marshaling is needed, then an interface pointer points directly at the object with no COM code in between.

    If the current thread is running in a single-threaded apartment, and it creates a COM object with thread affinity (also known as an "apartment-model object"; yes, the name is confusing), then the thread gets a pointer directly to the object. When you call p->Query­Interface(), you are calling directly into the Query­Interface implementation provided by the object.

    This principle has its pluses and minuses.

    People concerned with high performance pretty much insist that COM stay out of the way and get involved only when necessary. They consider it a plus that if there is no marshaling involved, then all pointers are direct pointers, and calls go straight to the target object without a single instruction of COM-provided code getting in the way.

    One downside of this is that every object is responsible for its own compatibility hacks. If there are bugs in the implementation of IUnknown::Query­Interface, then each object is on its own for working around them. There is no opportunity for the system to enforce correct behavior because there is no system code running. Each object becomes responsible for its own enforcement.

    Therefore, the answer to "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?" is "The marshaler is involved only sometimes."

    If the object being called belongs to the same apartment as the thread that is calling into it, then there is no marshaler, and the call goes directly to the object. Since there is no marshaler, the marshaler isn't around to enforce marshaling rules. It's up to the object to enforce marshaling rules, and if the object chooses not to, then you get into the cases where a method call works when the object is unmarshaled and fails when the object is marshaled.

  • The Old New Thing

    If a process crashes while holding a mutex, why is its ownership magically transferred to another process?

    • 29 Comments

    A customer was observing strange mutex ownership behavior. They had two processes that used a mutex to coordinate access to some shared resource. When the first process crashed while owning the mutex, they found that the second process somehow magically gained ownership of that mutex. Specifically, when the first process crashed, the second process could take the mutex, but when it released the mutex, the mutex was still not released. They discovered that in order to release the mutex, the second process had to call Release­Mutex twice. It's as if the claim on the mutex from the crashed process was secretly transferred to the second process.

    My psychic powers told me that that's not what was happening. I guessed that their code went something like this:

    // code in italics is wrong
    bool TryToTakeTheMutex()
    {
     return WaitForSingleObject(TheMutex, TimeOut) == WAIT_OBJECT_0;
    }
    

    The code failed to understand the consequences of WAIT_ABANDONED.

    In the case where the mutex was held by the first process when it crashed, the second process will attempt to claim the mutex, and it will succeed, and the return code from Wait­For­Single­Object will be WAIT_ABANDONED. Their code treated that value as a failure code rather than a modified success code.

    The second program therefore claimed the mutex without realizing it. That is what led the customer to believe that ownership was being magically transferred to the second program. It wasn't magic. The second program misinterpreted the return code.

    The second program saw that Try­To­Take­The­Mutex "failed", and it went off and did something else for a while. Then the next time it called Try­To­Take­The­Mutex, the function succeeded: It was a successful recursive acquisition, but the program thought it was the initial acquisition.

    The customer didn't reply back, so we never found out whether that was the actual problem, but I suspect it was.

  • The Old New Thing

    What is the story of the mysterious DS_RECURSE dialog style?

    • 12 Comments

    There are a few references to the DS_RECURSE dialog style scattered throughout MSDN, and they are all of the form "Don't use it." But if you look in your copy of winuser.h, there is no sign of DS_RECURSE anywhere. This obviously makes it trivial to avoid using it because you couldn't use it even if you wanted it, seeing as it doesn't exist.

    "Do not push the red button on the control panel!"

    There is no red button on the control panel.

    "Well, that makes it easy not to push it."

    As with many of these types of stories, the answer is rather mundane.

    When nested dialogs were added to Windows 95, the flag to indicate that a dialog is a control host was DS_RECURSE. The name was intended to indicate that anybody who is walking a dialog looking for controls should recurse into this window, since it has more controls inside.

    The window manager folks later decided to change the name, and they changed it to DS_CONTROL. All documentation that was written before the renaming had to be revised to change all occurrences of DS_RECURSE to DS_CONTROL.

    It looks like they didn't quite catch them all: There are two straggling references in the Windows Embedded documentation. My guess is that the Windows Embedded team took a snapshot of the main Windows documentation, and they took their snapshot before the renaming was complete.

    Unfortunately, I don't have any contacts in the Windows Embedded documentation team, so I don't know whom to contact to get them to remove the references to flags that don't exist.

  • The Old New Thing

    The grand ambition of giving your project the code name Highlander

    • 24 Comments

    Code name reuse is common at Microsoft, and there have been several projects at code-named Highlander, the movie with the tag line There can be only one. (Which makes the whole thing kind of ironic.)

    I was able to find a few of these projects. There are probably more that I couldn't find any records of.

    Two of the projects I found did not appear to be named Highlander for any reason beyond the fact that the person who chose the name was a fan of the movie.

    Another project code named Highlander was an internal IT effort to simplify the way it did something really full of acronyms that I don't understand. ("Reduce the architectural footprint of the XYZZY QX Extranet.") There used to be something like five different systems for doing this thing (whatever it is), and they wanted to consolidate them down to one.

    The project code named Highlander that people outside Microsoft will recognize is the one now known as Microsoft Account, but which started out as Passport.¹ Its goal was to provide single sign-on capability, so that you need to remember "only one" password.

    The last example is kind of complicated. There was a conflict between two teams. Team A was responsible for a client/server product and developed both the server back-end software as well as the client. Meanwhile, Team 1 wrote an alternative client with what they believed was a more user-friendly interface. A battle ensued between the two teams to write the better client, and management decided to go with Team 1's version.

    But Team A did not go down without a fight. Rather than putting their project to rest, Team A doubled down and tried to make an even more awesome client, which they code-named Highlander. The idea was that their project was engaged in an epic battle with Team 1, and the tag line There can be only one reflected their belief that the battle was to the death, and that their project would emerge victorious. This being back in the day when playing music on your Web page was cool, they even set up their internal Web site so that it played the Highlander theme music when you visited.

    They were correct in that there was ultimately only one.

    The bad news for them was that Team 1 was the winner of the second battle as well.

    To me, the moral of the story is to keep your project code name humble.

    Reminder: The ground rules for this site prohibits trying to guess the identity of a program whose name I intentionally did not reveal.

    ¹ The Wikipedia entry for Microsoft Account erroneously claims that the project was once known as Microsoft Wallet. That claim isn't even supported by the Web site they cite as a reference. The Web site says, "Microsoft Wallet has been updated to use Microsoft Passport technology." In other words, "Wallet now uses Passport for authentication." This is like seeing the sentence "Microsoft Active Directory uses Kerberos for authentication" and concluding "Kerberos was once named Microsoft Active Directory."

  • The Old New Thing

    Receiving a notification any time the selection changes in an Explorer window

    • 1 Comments

    Today's Little Program merely prints a message whenever the user changes the selection on the desktop. I chose the desktop for expediency, since it saves me the trouble of coming up with a way for the user to specify which Explorer window they want to track. Also, all I do is print a message saying "Selection changed!"; actually getting the selection was covered earlier in both C++ and script.

    Remember that Little Programs do little to no error checking.

    #define STRICT
    #include <windows.h>
    #include <ole2.h>
    #include <shlobj.h>
    #include <shdispid.h>
    #include <atlbase.h>
    #include <stdio.h>
    
    class CShellFolderViewEventsSink :
        public CDispInterfaceBase<DShellFolderViewEvents>
    {
    public:
     CShellFolderViewEventsSink() { }
    
     HRESULT SimpleInvoke(
        DISPID dispid, DISPPARAMS *pdispparams, VARIANT *pvarResult)
     {
      switch (dispid) {
      case DISPID_SELECTIONCHANGED:
       printf("Selection changed!\n");
       break;
      }
      return S_OK;
     }
    };
    
    int __cdecl wmain(int, wchar_t **)
    {
     CCoInitialize init;
     CComPtr<IShellFolderViewDual> spFolderView;
     GetDesktopAutomationObject(IID_PPV_ARGS(&spFolderView));
    
     CComPtr<CShellFolderViewEventsSink> spSink;
     spSink.Attach(new CShellFolderViewEventsSink());
     spSink->Connect(spFolderView);
    
     MessageBox(NULL, TEXT("Click OK when bored."), TEXT("Title"), MB_OK);
    
     spSink->Disconnect();
     return 0;
    }
    

    Our CShell­Folder­View­Events­Sink simply prints the message whenever it receives a DISPID_SELECTION­CHANGED event.

    Sure, this program isn't useful on its own, but you can incorporate into a program that uses an Explorer Browser so that your application can do something based on the current selection. (For example, if your program is using an Explorer Browser to let the user select files for upload, you can display the total file size of the current selection.)

  • The Old New Thing

    I marked my parameter as [optional], so why do I get an RPC error when I pass NULL?

    • 14 Comments

    Consider the following interface declaration in an IDL file:

    // Code in italics is wrong
    
    interface IFoo : IUnknown
    {
        HRESULT Cancel([in, optional, string] LPCWSTR pszReason);
    };
    

    The idea here is that you want to be able to call the Cancel method as pFoo->Cancel(NULL) if you don't want to provide a reason.

    If you try this, you'll find that the call sometimes fails with error 0x800706F4, which decodes to HRESULT_FROM_WIN32(RPC_X_NULL_REF_POINTER). What's going on here?

    The optional attribute does not mean what you think it means. To a C or C++ programmer, an "optional" pointer parameter typically means that it is valid to pass NULL/nullptr as the parameter value. But that's not what it means to the IDL compiler.

    To the IDL compiler, optional parameters are hints to the scripting engine that the parameter should be passed as VT_ERROR/DISP_E_PARAM­NOT­FOUND. The attribute is meaningful only when applied to parameters of type VARIANT or VARIANT*.

    What you actually want is the unique attribute. This somewhat confusingly-named attribute means "The parameter is allowed to be a null pointer." Therefore, the interface should have been written as

    interface IFoo : IUnknown
    {
        HRESULT Cancel([in, unique, string] LPCWSTR pszReason);
    };
    

    At the lowest level in the marshaler, pointer parameters are marked as ref, unique, or ptr. ref parameters may not be null, whereas unique and ptr parameters are allowed to be null. Larry Osterman explained to me that the default for interface pointers (anything derived from IUnknown) is unique and the default for all other pointer types is ref. Therefore, if you want to say that NULL is a valid value for a non-interface pointer parameter, you must say so explicitly by annotating the parameter as [unique].

    It's probably too late to change the behavior of MIDL to reject the [optional] tag on non-VARIANT parameters because in the decades since the attribute was introduced, it's probably being used incorrectly approximately twenty-five bazillion times, and making it an error would break a lot of code. (Even if you just made it a warning, that wouldn't help because a lot of people treat warnings as errors.)

    Exercise: Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?

  • The Old New Thing

    The stream pointer position in IDataObject::GetData and IDataObject::GetDataHere is significant

    • 5 Comments

    An oft-overlooked detail of the IData­Object::Get­Data and IData­Object::Get­Data­Here methods is the position of the stream pointer when the result is a stream. These rules are buried in the documentation, so I'm going to call them out louder.

    Let's look at IData­Object::Get­Data first.

    If IData­Object::Get­Data returns a stream, then the stream pointer must be positioned at the end of the stream before the stream is returned. In other words, the last thing you do before returning the stream is seek it to the end. The contents of the data object are assumed to extend from the start of the stream to the stream's position as returned by IData­Object::Get­Data. (And no, I don't know why this rule exists.)

    I messed this up in my demonstration of how to drag a stream. Let's fix it.

      case DATA_FILECONTENTS:
        pmed->tymed = TYMED_ISTREAM;
        pmed->pstm = SHOpenRegStream(HKEY_LOCAL_MACHINE,
           TEXT("Hardware\\Description\\System\\CentralProcessor\\0"),
           TEXT("~MHz"), STGM_READ);
        if (pmed->pstm) {
          LARGE_INTEGER liZero = { 0, 0 };
          pmed->pstm->Seek(liZero, STREAM_SEEK_END, NULL);
        }
        return pmed->pstm ? S_OK : E_FAIL;
      }
    

    But what if you don't know the stream size? For example, what if the stream is coming from a live download? What if the stream doesn't support seeking? What if the stream is infinite? In those cases, you don't really have a choice. You just leave the stream pointer at the beginning and hope for the best. (Fortunately, at least in the case of virtual file content, the shell is okay with people who leave the stream pointer at the start of the stream. Probably for reasons like this.)

    There is a similar detail with IData­Object::Get­Data­Here: If you are asked to produce the data into an existing stream, you should write the data starting at the stream's current position and leave the stream pointer at the end of the data you just wrote.

Page 7 of 438 (4,378 items) «56789»