• The Old New Thing

    Why does misdetected Unicode text tend to show up as Chinese characters?


    If you take an ASCII string and cast it to Unicode,¹ the results are usually nonsense Chinese. Why does ASCII→Unicode mojibake result in Chinese? Why not Hebrew or French?

    The Latin alphabet in ASCII lives in the range 0x41 through 0x7A. If this gets misinterpreted as UTF-16LE, the resulting characters are of the form U+XXYY where XX and YY are in the range 0x41 through 0x7A. Generously speaking, this means that the results are in the range U+4141 through U+7A7A. This overlaps the following Unicode character ranges:

    • CJK Unified Ideographs Extension A (U+3400 through U+4DBF)
    • Yijing Hexagram Symbols (U+4DC0 through U+4DFF)
    • CJK Unified Ideographs (U+4E00 through U+9FFF)

    But you never see the Yijing hexagram symbols because that would require YY to be in the range 0xC0 through 0xFF, which is not valid ASCII. That leaves only CJK Unified Ideographs of one sort of another.

    That's why ASCII misinterpreted as Unicode tends to result in nonsense Chinese.

    The CJK Unified Ideographs are by far the largest single block of Unicode characters in the BMP, so just by purely probabilistic arguments, a random character in BMP is most likely to be Chinese. If you look at a graphic representation of what languages occupy what parts of the BMP, you'll see that it's a sea of pink (CJK) and red (East Asian), occasionally punctuated by other scripts.

    It just so happens that the placement of the CJK ideographs in the BMP effectively guarantees it.

    Now, ASCII text is not all just Latin letters. There are space and punctuation marks, too, so you may see an occasional character from another Unicode range. But most of the time, it's a Latin letter, which means that most of the time, your mojibake results in Chinese.

    ¹ Remember, in the context of Windows, "Unicode" is generally taken to be shorthand for UTF-16LE.

  • The Old New Thing

    The grand ambition of giving your project the code name Highlander


    Code name reuse is common at Microsoft, and there have been several projects at code-named Highlander, the movie with the tag line There can be only one. (Which makes the whole thing kind of ironic.)

    I was able to find a few of these projects. There are probably more that I couldn't find any records of.

    Two of the projects I found did not appear to be named Highlander for any reason beyond the fact that the person who chose the name was a fan of the movie.

    Another project code named Highlander was an internal IT effort to simplify the way it did something really full of acronyms that I don't understand. ("Reduce the architectural footprint of the XYZZY QX Extranet.") There used to be something like five different systems for doing this thing (whatever it is), and they wanted to consolidate them down to one.

    The project code named Highlander that people outside Microsoft will recognize is the one now known as Microsoft Account, but which started out as Passport.¹ Its goal was to provide single sign-on capability, so that you need to remember "only one" password.

    The last example is kind of complicated. There was a conflict between two teams. Team A was responsible for a client/server product and developed both the server back-end software as well as the client. Meanwhile, Team 1 wrote an alternative client with what they believed was a more user-friendly interface. A battle ensued between the two teams to write the better client, and management decided to go with Team 1's version.

    But Team A did not go down without a fight. Rather than putting their project to rest, Team A doubled down and tried to make an even more awesome client, which they code-named Highlander. The idea was that their project was engaged in an epic battle with Team 1, and the tag line There can be only one reflected their belief that the battle was to the death, and that their project would emerge victorious. This being back in the day when playing music on your Web page was cool, they even set up their internal Web site so that it played the Highlander theme music when you visited.

    They were correct in that there was ultimately only one.

    The bad news for them was that Team 1 was the winner of the second battle as well.

    To me, the moral of the story is to keep your project code name humble.

    Reminder: The ground rules for this site prohibits trying to guess the identity of a program whose name I intentionally did not reveal.

    ¹ The Wikipedia entry for Microsoft Account erroneously claims that the project was once known as Microsoft Wallet. That claim isn't even supported by the Web site they cite as a reference. The Web site says, "Microsoft Wallet has been updated to use Microsoft Passport technology." In other words, "Wallet now uses Passport for authentication." This is like seeing the sentence "Microsoft Active Directory uses Kerberos for authentication" and concluding "Kerberos was once named Microsoft Active Directory."

  • The Old New Thing

    Poor man's comments: Inserting text that has no effect into a configuration file


    Consider a program which has a configuration file, but the configuration file format does not have provisions for comments. Maybe the program has a "list of authorized users", where each line takes the form allow x or deny x, where x is a group or user. For example, suppose we have access_list that goes like this:

    allow payroll_department
    deny alice
    allow personnel_department
    allow bob

    This is the sort of file that can really use comments because people are going to want to know things like "Why does Bob have access?"

    One way of doing this is to embed the comments in the configuration file in a way that has no net effect. You can do this to add separator lines, too.

    deny !____________________________________________________________
    allow payroll_department
    deny !alice_is_an_intern_and_does_not_need_access_to_this_database
    deny alice
    deny !____________________________________________________________
    allow personnel_department
    deny !____________________________________________________________
    deny !temporary_access_for_auditor
    deny !see_service_request_31415
    deny !access_expires_on_2001_12_31
    allow bob

    Assuming that you don't have any users whose names begin with an exclamation point, the extra deny !... lines have no effect: They tell the system to deny access to a nonexistent user.

    Sometimes finding the format of a line that has no effect can take some creativity. For example, if you have a firewall configuration file, you might use URLs that correspond to no valid site.

    allow nobody http://example.com/PAYROLL_DEPARTMENT/--------------------
    allow alice http://contoso.com/payroll/
    allow nobody http://example.com/PURCHASING_DEPARTMENT/-----------------
    allow bob http://contoso.com/purchasing/
    allow nobody http://example.com/SPECIAL_REQUEST/-----------------------
    allow ceo http://www.youtube.com/

    Of course, these extra lines create work for the program, since it will sit there evaluating rules that will never apply. You may have to craft them in a way so that they have minimum cost. In the example above, we assigned the comments to a user called nobody which presumably will never try to access the Internet. We definitely didn't want to write the comment like

    allow * http://example.com/PAYROLL_DEPARTMENT/-------------------------

    because that would evaluate the dummy rule for every user.

    If you are willing to add a layer of process, you can tell everybody to stop editing the configuration files directly and instead edit an alternate file that gets preprocessed into a configuration file. For example, we might have access_list.commented that goes

    allow payroll_deparment
    deny alice // payroll intern does not need access to this database.
    allow personnel_department
    allow bob // Temporary access for auditor, see SR 31415. Expires 2001/12/31.

    Everybody agrees to edit the access_list.commented file, and after each edit they run a script that sends the file through the C++ preprocessor and puts the result in the access_list file. By using the C++ preprocessor, you enable features like #include directives and #define macros.

  • The Old New Thing

    A lie repeated often enough becomes the truth: The continuing saga of the Windows 3.1 blue screen of death (which, by the way, was never called that)


    HN has been the only major site to report the history of the Windows 3.1 Ctrl+Alt+Del dialog correctly. But it may have lost that title due to this comment thread.

    I read here that Steve Ballmer wrote part of [the blue screen of death] too.

    The comment linked to one of may articles that erroneously reported that Steve wrote the blue screen of death.

    Somebody replied,


    linking back to my article where I set the record straight.

    Undeterred, the original commenter wrote,

    LOL! so far only MSDN has been refuting the claim.

    and linked to two technology sites which reported the story incorrectly.

    Just goes to show that a lie repeated often enough becomes the truth.

    Oh, and by the way, the phrase "blue screen of death" did not really apply to the blue screen messages in Windows 3.1 or Windows 95. As we saw earlier, the Windows 3.1 fatal error message was a black screen of death, and in Windows 95, the blue screen message was more a screen of profound injury rather than death. (Windows 95 was sort of like the Black Knight, trying very hard to continue the fight despite having lost all of its limbs.)

    The phrase "blue screen of death" was generally attributed to the blue screen fatal error message of Windows NT. In Windows 95, we just called them "blue screen messages", without the "of death".

    I didn't expect this to become "blue screen week", though that's sort of what it turned into.

  • The Old New Thing

    It's a trap! Employment documents that require you to violate company policy


    One of my colleagues had a previous job involving tuning spam filters and removing objectionable content. Before he could start, he was told that he had to sign a special release. The form said basically, "I understand that my job may require me to see pornography or other objectionable material, and I promise not to sue."

    He asked, "So where is the part that says I'm not going to be fired for doing that?"

    "What do you mean?"

    He explained, "This document protects the company from me. But where is the part that protects me from the company?"

    "I don't know what you're talking about."

    He spelled it out: "Company policy says that watching pornography at work is grounds for termination. This document does not actually say that it's okay for me to do so if it is done in the course of my job duties."

    "Look, you can either sign the release form or not, but you can't work until you sign it."

    My colleague sighed as he signed the form. "Whatever. Nevermind."

  • The Old New Thing

    If you're looking for the code that displays a particular dialog box, the most directly way to find it is to look for the dialog box


    Suppose you are working in a large or unfamiliar code base and you want to know where the code is that displays a particular dialog box or message box or something. Probably the most direct way of figuring this out is to look for the strings.

    Say there is a message box that asks for user confirmation. "Are you sure you want to frobulate the flux capacitor?" Search for that string in your source code. It will probably be in a resource file.

    resource.rc:IDS_CONFIRM­FROBULATE "Are you sure you want to frobulate the flux capacitor?"

    Great, now you have the string ID for that message. You can perform a second search for that ID.

    resource.h:#define IDS_CONFIRM­FROBULATE 1024
    resource.rc:IDS_CONFIRM­FROBULATE "Are you sure you want to frobulate the flux capacitor?"
    maintenance.cpp:   strPrompt.LoadString(IDS_CONFIRM­FROBULATE);

    If the thing you are searching for is a dialog box or menu item, then be aware that there may be an accelerator in the string, so a straight grep won't find it.

    No matches for "Enter the new name of the frobulator:"

    For a dialog box, you can tap the Alt key to make the accelerator show up, so you can search for the right string. For a menu, you invoke the menu via the keyboard. Or in either case, you can disable the Hide underlined letters for keyboard navigation setting.

    resource.rc:  LTEXT "Enter the ne&w name of the frobulator:",

    I tend to be lazy and instead of using any of those tricks to make the underlines show up, I just search for a shorter string and hope that the accelerator isn't in it.

    resource.rc:  LTEXT "Enter the ne&w name of the frobulator:",

    "But Raymond, hitting the Alt is just a quick tap on the keyboard. Surely you can't be that lazy!"

    Right. If the dialog box were right in front of me, then I could tap the Alt and be done. But usually, when I am investigating this sort of thing, it's because somebody has sent a screen shot and asks, "Where is the code that displays this?" Tapping Alt on a screen shot doesn't usually get you very far.

    Once you find the code that displays the dialog box or message box or whatever, you can then study the code to answer follow-up questions like "What are the conditions under which this dialog will appear?" or "Is there a setting to suppress this dialog?"

  • The Old New Thing

    The latest technologies in plaintext editing: NotepadConf


    On November 13, 2014 November 14, 2014, Saint Paul, Minnesota will be the home to NotepadConf, which bills itself as "the premier technology conference for Notepad.exe users and text enthusiasts."

    I'm still not sure whom Microsoft will send to the conference, but maybe that person could give a talk on how you can use Notepad to take down the entire Internet.

    Update: The conference has been rescheduled to Friday the 14th.

  • The Old New Thing

    Microspeak: 1 – 1 is not zero


    In his reddit AMA, Joe Belfiore wrote

    i have regular 1-1 meetings with my counterparts in Office, Skype, Xbox.

    The little bit of jargon there is 1-1 meeting. This is an abbreviation for one-on-one meeting, a common business practice wherein two people, typically a manager and a direct report, have a face-to-face meeting with no one else present. In the case Joe used, the meeting is not between a manager and a direct report but between two peers.

    The term is also abbreviated 1:1, which like 1 − 1 also looks like a bit of mathematical nonsense. But it's not zero or one. It's just an abbreviation for business jargon.

    Of course, Microspeak isn't happy with the abbreviation as-is. Microspeak takes it further and nounifies the abbreviation. "I'll bring that up in my 1–1 with Bob tomorrow" means "I'll bring that up in my one-on-one meeting with Bob tomorrow."

    Note that the abbreviation OOO is not used to mean one-on-one. That abbreviation is used to mean Out of the office, although it is more commonly abbreviated OOF at Microsoft.

  • The Old New Thing

    Microspeak: Tell Mode / Ask Mode


    As a product nears release, the rate of change slows down, and along the way, the ship room goes through stages known as Tell Mode and Ask Mode.

    In Tell Mode, any changes to the product do not require prior approval, but you are required to present your changes to the next ship room meeting and be prepared to explain and defend them. The purpose of this exercise is to get teams accustomed to the idea of having to present their changes to the ship room as a warm-up for Ask Mode. There is also the psychological aspect: If you have to present and defend your changes, you are going to be more careful about deciding which changes to make, how you will go about making them, and how thoroughly you're going to validate those changes. For example, if a bug could be fixed by applying a targeted fix or by rewriting the entire class, you are probably not going to choose to rewrite. (In theory, the ship room may reject your changes after the fact, and then you have to go back them out. But this is rare in practice. The ship room usually lets you off with a warning unless your transgression was particularly severe.)

    The next stage of scrutiny is known as Ask Mode. In this stage, any proposed changes to the product must be presented to the ship room before they can be submitted. Rejection is more frequent here. Time has passed and the bug bar has gone up, and because it is easier to get forgiveness than permission.

    Here is a more detailed explanation of how one team implements the two modes.

    Note that there can be multiple levels of ship room. There may be a local feature team ship room, then a group-wide ship room, then a product-wide ship room, and it is not uncommon for each ship room to be in a different mode. For example, the local feature team ship room may be in Ask Mode, the group-wide ship room is in Tell Mode, and the product-wide ship room isn't looking at individual bugs yet. This means that when you want to make a change, you need to get permission from your local feature team, and then after you commit the change, you need to get forgiveness from the group ship room.

  • The Old New Thing

    Why are only some of the Windows 7 system notification icons colorless?


    André decided to play "gotcha" by noting that not all of the notification icons went colorless and wondered what the criteria was for deciding which ones to make colorless and which ones to make colorful.

    It's very simple: The design team generated colorless versions of the most commonly-seen notification icons. They didn't have the time to generate colorless versions of all notification icons, so they focused on the ones that gave them the most benefit.

    This is the standard tradeoff you have to make whenever you have finite resources. Eventually the marginal benefit of redrawing one more icon exceeds its marginal cost, at which point you stop. The marginal cost is measured not only in actual resources (designers can redraw only so many icons per day, and you have money to hire only so many designers) but also in opportunity cost (time spent redrawing icons to be colorless is time not spent on other design tasks).

    This is the same reason that not all icons in Windows XP were given the full-color perspective-view treatment. For example, nearly all of the icons in the Administrative Tools section are the old Windows 2000-style 16-color flat (or isometric) icons. The design team focused on the 100ish most commonly used icons and went to the effort of redrawing them in the Windows XP style. The more rarely-used icons were left in the old style because the cost of converting them did not merit the benefit.

    The same thing happened in Windows Vista, when the icon design changed yet again. The style became less stylized and more realistic, but not quite photorealistic, and the angle of presentation changed. The design team had the resources to convert the most commonly used icons, and the rest were left as they were.

    It's the Pareto Principle again. If you have finite resources (and who doesn't) you may find that you can get 80% of the benefit by doing only 20% of the work. And that leaves 80% of your capacity available to address some other problem.

Page 1 of 93 (930 items) 12345»