• The Old New Thing

    Why does misdetected Unicode text tend to show up as Chinese characters?


    If you take an ASCII string and cast it to Unicode,¹ the results are usually nonsense Chinese. Why does ASCII→Unicode mojibake result in Chinese? Why not Hebrew or French?

    The Latin alphabet in ASCII lives in the range 0x41 through 0x7A. If this gets misinterpreted as UTF-16LE, the resulting characters are of the form U+XXYY where XX and YY are in the range 0x41 through 0x7A. Generously speaking, this means that the results are in the range U+4141 through U+7A7A. This overlaps the following Unicode character ranges:

    • CJK Unified Ideographs Extension A (U+3400 through U+4DBF)
    • Yijing Hexagram Symbols (U+4DC0 through U+4DFF)
    • CJK Unified Ideographs (U+4E00 through U+9FFF)

    But you never see the Yijing hexagram symbols because that would require YY to be in the range 0xC0 through 0xFF, which is not valid ASCII. That leaves only CJK Unified Ideographs of one sort of another.

    That's why ASCII misinterpreted as Unicode tends to result in nonsense Chinese.

    The CJK Unified Ideographs are by far the largest single block of Unicode characters in the BMP, so just by purely probabilistic arguments, a random character in BMP is most likely to be Chinese. If you look at a graphic representation of what languages occupy what parts of the BMP, you'll see that it's a sea of pink (CJK) and red (East Asian), occasionally punctuated by other scripts.

    It just so happens that the placement of the CJK ideographs in the BMP effectively guarantees it.

    Now, ASCII text is not all just Latin letters. There are space and punctuation marks, too, so you may see an occasional character from another Unicode range. But most of the time, it's a Latin letter, which means that most of the time, your mojibake results in Chinese.

    ¹ Remember, in the context of Windows, "Unicode" is generally taken to be shorthand for UTF-16LE.

  • The Old New Thing

    Simulating media controller buttons like Play and Pause


    Today's Little Program simulates pressing the Play/Pause button on your fancy keyboard. This might be useful if you want to write a program that converts some other input (say, gesture detection) into media controller events.

    One way of doing this is to take advantage of the Def­Window­Proc function, since the default behavior for the WM_APP­COMMAND message is to pass the message up the parent chain, and if it still can't find a handler, it hands the message to the shell for global processing.

    Remember, don't fumble around. If you want to send a message to a window, then send a message to a window. Don't broadcast a message to every window in the system (resulting in mass chaos).

    Take the scratch program and make this simple addition:

    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
     if (ch == ' ') {
      SendMessage(hwnd, WM_APPCOMMAND, (WPARAM)hwnd,
     HANDLE_MSG(hwnd, WM_CHAR, OnChar);

    When you press the space bar in the scratch application, it pretends that you instead pressed the Play/Pause button on your fancy keyboard with no shift modifiers.

    The scratch program doesn't do anything with the key, so it ends up falling through to Def­Window­Proc, which eventually hands the key to the shell and any other registered shell hooks. If you have a program like Windows Media Player which registers for shell events, it will see the notification and pause/resume playback.

    Of course, this assumes that the program you want to talk to listens globally for the keypress. If you want to make the current foreground program respond as if you had pressed the Play/Pause, you can just inject the keypress.

    int __cdecl main(int, char**)
     INPUT inputs[2] = {};
     inputs[0].type = INPUT_KEYBOARD;
     inputs[0].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[0].ki.wScan = 0x22;
     inputs[0].ki.dwFlags = KEYEVENTF_EXTENDEDKEY;
     inputs[1].type = INPUT_KEYBOARD;
     inputs[1].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[1].ki.wScan = 0x22;
     SendInput(2, inputs, sizeof(INPUT));
     return 0;

    Note, however, that since we didn't do anything about the state of modifier keys, if the user happens to have the shift key down at the time you injected the message, the application is going to be told, "Hey, do your play/pause thing, and if you change behavior when the shift key is down, here's your chance."

    But what did you expect from a Little Program?

  • The Old New Thing

    Marshaling won't get in your way if it isn't needed


    I left an exercise at the end of last week's article: "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?"

    COM subscribes to the principle that if no marshaling is needed, then an interface pointer points directly at the object with no COM code in between.

    If the current thread is running in a single-threaded apartment, and it creates a COM object with thread affinity (also known as an "apartment-model object"; yes, the name is confusing), then the thread gets a pointer directly to the object. When you call p->Query­Interface(), you are calling directly into the Query­Interface implementation provided by the object.

    This principle has its pluses and minuses.

    People concerned with high performance pretty much insist that COM stay out of the way and get involved only when necessary. They consider it a plus that if there is no marshaling involved, then all pointers are direct pointers, and calls go straight to the target object without a single instruction of COM-provided code getting in the way.

    One downside of this is that every object is responsible for its own compatibility hacks. If there are bugs in the implementation of IUnknown::Query­Interface, then each object is on its own for working around them. There is no opportunity for the system to enforce correct behavior because there is no system code running. Each object becomes responsible for its own enforcement.

    Therefore, the answer to "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?" is "The marshaler is involved only sometimes."

    If the object being called belongs to the same apartment as the thread that is calling into it, then there is no marshaler, and the call goes directly to the object. Since there is no marshaler, the marshaler isn't around to enforce marshaling rules. It's up to the object to enforce marshaling rules, and if the object chooses not to, then you get into the cases where a method call works when the object is unmarshaled and fails when the object is marshaled.

  • The Old New Thing

    If a process crashes while holding a mutex, why is its ownership magically transferred to another process?


    A customer was observing strange mutex ownership behavior. They had two processes that used a mutex to coordinate access to some shared resource. When the first process crashed while owning the mutex, they found that the second process somehow magically gained ownership of that mutex. Specifically, when the first process crashed, the second process could take the mutex, but when it released the mutex, the mutex was still not released. They discovered that in order to release the mutex, the second process had to call Release­Mutex twice. It's as if the claim on the mutex from the crashed process was secretly transferred to the second process.

    My psychic powers told me that that's not what was happening. I guessed that their code went something like this:

    // code in italics is wrong
    bool TryToTakeTheMutex()
     return WaitForSingleObject(TheMutex, TimeOut) == WAIT_OBJECT_0;

    The code failed to understand the consequences of WAIT_ABANDONED.

    In the case where the mutex was held by the first process when it crashed, the second process will attempt to claim the mutex, and it will succeed, and the return code from Wait­For­Single­Object will be WAIT_ABANDONED. Their code treated that value as a failure code rather than a modified success code.

    The second program therefore claimed the mutex without realizing it. That is what led the customer to believe that ownership was being magically transferred to the second program. It wasn't magic. The second program misinterpreted the return code.

    The second program saw that Try­To­Take­The­Mutex "failed", and it went off and did something else for a while. Then the next time it called Try­To­Take­The­Mutex, the function succeeded: It was a successful recursive acquisition, but the program thought it was the initial acquisition.

    The customer didn't reply back, so we never found out whether that was the actual problem, but I suspect it was.

  • The Old New Thing

    What is the story of the mysterious DS_RECURSE dialog style?


    There are a few references to the DS_RECURSE dialog style scattered throughout MSDN, and they are all of the form "Don't use it." But if you look in your copy of winuser.h, there is no sign of DS_RECURSE anywhere. This obviously makes it trivial to avoid using it because you couldn't use it even if you wanted it, seeing as it doesn't exist.

    "Do not push the red button on the control panel!"

    There is no red button on the control panel.

    "Well, that makes it easy not to push it."

    As with many of these types of stories, the answer is rather mundane.

    When nested dialogs were added to Windows 95, the flag to indicate that a dialog is a control host was DS_RECURSE. The name was intended to indicate that anybody who is walking a dialog looking for controls should recurse into this window, since it has more controls inside.

    The window manager folks later decided to change the name, and they changed it to DS_CONTROL. All documentation that was written before the renaming had to be revised to change all occurrences of DS_RECURSE to DS_CONTROL.

    It looks like they didn't quite catch them all: There are two straggling references in the Windows Embedded documentation. My guess is that the Windows Embedded team took a snapshot of the main Windows documentation, and they took their snapshot before the renaming was complete.

    Unfortunately, I don't have any contacts in the Windows Embedded documentation team, so I don't know whom to contact to get them to remove the references to flags that don't exist.

  • The Old New Thing

    The grand ambition of giving your project the code name Highlander


    Code name reuse is common at Microsoft, and there have been several projects at code-named Highlander, the movie with the tag line There can be only one. (Which makes the whole thing kind of ironic.)

    I was able to find a few of these projects. There are probably more that I couldn't find any records of.

    Two of the projects I found did not appear to be named Highlander for any reason beyond the fact that the person who chose the name was a fan of the movie.

    Another project code named Highlander was an internal IT effort to simplify the way it did something really full of acronyms that I don't understand. ("Reduce the architectural footprint of the XYZZY QX Extranet.") There used to be something like five different systems for doing this thing (whatever it is), and they wanted to consolidate them down to one.

    The project code named Highlander that people outside Microsoft will recognize is the one now known as Microsoft Account, but which started out as Passport.¹ Its goal was to provide single sign-on capability, so that you need to remember "only one" password.

    The last example is kind of complicated. There was a conflict between two teams. Team A was responsible for a client/server product and developed both the server back-end software as well as the client. Meanwhile, Team 1 wrote an alternative client with what they believed was a more user-friendly interface. A battle ensued between the two teams to write the better client, and management decided to go with Team 1's version.

    But Team A did not go down without a fight. Rather than putting their project to rest, Team A doubled down and tried to make an even more awesome client, which they code-named Highlander. The idea was that their project was engaged in an epic battle with Team 1, and the tag line There can be only one reflected their belief that the battle was to the death, and that their project would emerge victorious. This being back in the day when playing music on your Web page was cool, they even set up their internal Web site so that it played the Highlander theme music when you visited.

    They were correct in that there was ultimately only one.

    The bad news for them was that Team 1 was the winner of the second battle as well.

    To me, the moral of the story is to keep your project code name humble.

    Reminder: The ground rules for this site prohibits trying to guess the identity of a program whose name I intentionally did not reveal.

    ¹ The Wikipedia entry for Microsoft Account erroneously claims that the project was once known as Microsoft Wallet. That claim isn't even supported by the Web site they cite as a reference. The Web site says, "Microsoft Wallet has been updated to use Microsoft Passport technology." In other words, "Wallet now uses Passport for authentication." This is like seeing the sentence "Microsoft Active Directory uses Kerberos for authentication" and concluding "Kerberos was once named Microsoft Active Directory."

  • The Old New Thing

    Receiving a notification any time the selection changes in an Explorer window


    Today's Little Program merely prints a message whenever the user changes the selection on the desktop. I chose the desktop for expediency, since it saves me the trouble of coming up with a way for the user to specify which Explorer window they want to track. Also, all I do is print a message saying "Selection changed!"; actually getting the selection was covered earlier in both C++ and script.

    Remember that Little Programs do little to no error checking.

    #define STRICT
    #include <windows.h>
    #include <ole2.h>
    #include <shlobj.h>
    #include <shdispid.h>
    #include <atlbase.h>
    #include <stdio.h>
    class CShellFolderViewEventsSink :
        public CDispInterfaceBase<DShellFolderViewEvents>
     CShellFolderViewEventsSink() { }
     HRESULT SimpleInvoke(
        DISPID dispid, DISPPARAMS *pdispparams, VARIANT *pvarResult)
      switch (dispid) {
       printf("Selection changed!\n");
      return S_OK;
    int __cdecl wmain(int, wchar_t **)
     CCoInitialize init;
     CComPtr<IShellFolderViewDual> spFolderView;
     CComPtr<CShellFolderViewEventsSink> spSink;
     spSink.Attach(new CShellFolderViewEventsSink());
     MessageBox(NULL, TEXT("Click OK when bored."), TEXT("Title"), MB_OK);
     return 0;

    Our CShell­Folder­View­Events­Sink simply prints the message whenever it receives a DISPID_SELECTION­CHANGED event.

    Sure, this program isn't useful on its own, but you can incorporate into a program that uses an Explorer Browser so that your application can do something based on the current selection. (For example, if your program is using an Explorer Browser to let the user select files for upload, you can display the total file size of the current selection.)

  • The Old New Thing

    I marked my parameter as [optional], so why do I get an RPC error when I pass NULL?


    Consider the following interface declaration in an IDL file:

    // Code in italics is wrong
    interface IFoo : IUnknown
        HRESULT Cancel([in, optional, string] LPCWSTR pszReason);

    The idea here is that you want to be able to call the Cancel method as pFoo->Cancel(NULL) if you don't want to provide a reason.

    If you try this, you'll find that the call sometimes fails with error 0x800706F4, which decodes to HRESULT_FROM_WIN32(RPC_X_NULL_REF_POINTER). What's going on here?

    The optional attribute does not mean what you think it means. To a C or C++ programmer, an "optional" pointer parameter typically means that it is valid to pass NULL/nullptr as the parameter value. But that's not what it means to the IDL compiler.

    To the IDL compiler, optional parameters are hints to the scripting engine that the parameter should be passed as VT_ERROR/DISP_E_PARAM­NOT­FOUND. The attribute is meaningful only when applied to parameters of type VARIANT or VARIANT*.

    What you actually want is the unique attribute. This somewhat confusingly-named attribute means "The parameter is allowed to be a null pointer." Therefore, the interface should have been written as

    interface IFoo : IUnknown
        HRESULT Cancel([in, unique, string] LPCWSTR pszReason);

    At the lowest level in the marshaler, pointer parameters are marked as ref, unique, or ptr. ref parameters may not be null, whereas unique and ptr parameters are allowed to be null. Larry Osterman explained to me that the default for interface pointers (anything derived from IUnknown) is unique and the default for all other pointer types is ref. Therefore, if you want to say that NULL is a valid value for a non-interface pointer parameter, you must say so explicitly by annotating the parameter as [unique].

    It's probably too late to change the behavior of MIDL to reject the [optional] tag on non-VARIANT parameters because in the decades since the attribute was introduced, it's probably being used incorrectly approximately twenty-five bazillion times, and making it an error would break a lot of code. (Even if you just made it a warning, that wouldn't help because a lot of people treat warnings as errors.)

    Exercise: Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?

  • The Old New Thing

    The stream pointer position in IDataObject::GetData and IDataObject::GetDataHere is significant


    An oft-overlooked detail of the IData­Object::Get­Data and IData­Object::Get­Data­Here methods is the position of the stream pointer when the result is a stream. These rules are buried in the documentation, so I'm going to call them out louder.

    Let's look at IData­Object::Get­Data first.

    If IData­Object::Get­Data returns a stream, then the stream pointer must be positioned at the end of the stream before the stream is returned. In other words, the last thing you do before returning the stream is seek it to the end. The contents of the data object are assumed to extend from the start of the stream to the stream's position as returned by IData­Object::Get­Data. (And no, I don't know why this rule exists.)

    I messed this up in my demonstration of how to drag a stream. Let's fix it.

        pmed->tymed = TYMED_ISTREAM;
        pmed->pstm = SHOpenRegStream(HKEY_LOCAL_MACHINE,
           TEXT("~MHz"), STGM_READ);
        if (pmed->pstm) {
          LARGE_INTEGER liZero = { 0, 0 };
          pmed->pstm->Seek(liZero, STREAM_SEEK_END, NULL);
        return pmed->pstm ? S_OK : E_FAIL;

    But what if you don't know the stream size? For example, what if the stream is coming from a live download? What if the stream doesn't support seeking? What if the stream is infinite? In those cases, you don't really have a choice. You just leave the stream pointer at the beginning and hope for the best. (Fortunately, at least in the case of virtual file content, the shell is okay with people who leave the stream pointer at the start of the stream. Probably for reasons like this.)

    There is a similar detail with IData­Object::Get­Data­Here: If you are asked to produce the data into an existing stream, you should write the data starting at the stream's current position and leave the stream pointer at the end of the data you just wrote.

  • The Old New Thing

    The citizenship test is pass/fail; there's no blue ribbon for acing it


    The civics portion of the United States citizenship test is an oral exam wherein you must correctly answer six out of ten questions. One of my friends studiously prepared for his examination, going so far as buying a CD with the questions and answers and listening to it every day during his commute to and from work.

    At last, the day arrived, and my friend went in to take his citizenship examination. The examiner led him to an office, and the two of them sat down for the test.

    "Who was President during World War II?"

    — Franklin D. Roosevelt.

    "Correct. How many justices are there on the Supreme Court?"

    — Nine.


    And so on. Question 3, correct.

    Question 4, correct.

    Question 5, correct.

    Question 6, correct.

    And at that point, the examiner said, "Congratulations. You passed. There is a naturalization ceremony in two hours. Can you make it?"

    My friend was kind of surprised. Wasn't this a ten-question test? What about the other four questions?

    And then he realized: You only have to get six right. He got six right. How well he does on the remaining four questions is immaterial.

    My friend was hoping to get a perfect score of 10/10 on the test, or at least to find out whether he could get all ten right, just as a point of personal satisfaction, but of course the examiner doesn't care whether this guy can get all ten right. There's no blue ribbon for acing your citizenship test. It's pass/fail.

    Bonus chatter: My friend hung around for two hours and was naturalized that same day. He said that for something that could have been purely perfunctory (seeing as the people who work there have done this hundreds if not thousands of times), the ceremony was was quite well-done and was an emotionally touching experience.

    In case you hadn't noticed, today is Constitution Day, also known as Citizenship Day. One of the odd clauses in the legislation establishing the day of observance is that all schools which receive federal funding must "hold an educational program" on the United States Constitution on that day. This is why students at massage therapy schools and beauty schools have to watch a video of two Supreme Court justices.

Page 1 of 432 (4,317 items) 12345»