• The Old New Thing

    Even in computing, simultaneity is relative

    • 31 Comments

    Einstein discovered that simultaneity is relative. This is also true of computing.

    People will ask, "Is it okay to do X on one thread and Y on another thread simultaneously?" Here are some examples:

    • X = "close a handle" and Y = "use that handle".
    • X = "call UnregisterWaitForSingleObject on a handle", Y = "call UnregisterWaitForSingleObject on that same handle".

    You can answer this question knowing nothing about the internal behavior of those operations. All you need to know are some physics and the answers to much simpler questions about what is valid sequential code.

    Let's do a thought experiment with simultaneity.

    Since simultaneity is relative, any code that does X and Y simultaneously can be observed to have performed X before Y or Y before X, depending on your frame of reference. That's how the universe works.

    So if it were okay to do them simultaneously, then it must also be okay to do them one after the other, since they do occur one after the other if you walk past the computer in the correct direction.

    Is it okay to use a handle after closing it? Is it okay to unregister a wait event twice?

    The answer to both questions is "No," and therefore it isn't okay to do them simultaneously either.

    If you don't like using physics to solve this problem, you can also do it from a purely technical perspective.

    Invoking a function is not an atomic operation. You prepare the parameters, you call the entry point, the function does some work, it returns. Even if you somehow manage to get both threads to reach the function entry point simultaneously (even though as we know from physics there is no such thing as true simultaneity), there's always the possibility that one thread will get pre-empted immediately after the "call" instruction has transferred control to the first instruction of the target function, while the other thread continues to completion. After the second thread runs to completion, the pre-empted thread gets scheduled and begins execution of the function body.

    Under this situation, you effectively called the two functions one after the other, despite all your efforts to call them simultaneously. Since you can't prevent this scenario from occurring, you have to code with the possibility that it might actually happen.

    Hopefully this second explanation will satisfy the people who don't believe in the power of physics. Personally, I prefer using physics.

  • The Old New Thing

    Why does Windows keep your BIOS clock on local time?

    • 44 Comments

    Even though Windows NT uses UTC internally, the BIOS clock stays on local time. Why is that?

    There are a few reasons. One is a chain of backwards compatibility.

    In the early days, people often dual-booted between Windows NT and MS-DOS/Windows 3.1. MS-DOS and Windows 3.1 operate on local time, so Windows NT followed suit so that you wouldn't have to keep changing your clock each time you changed operating systems.

    As people upgraded from Windows NT to Windows 2000 to Windows XP, this choice of time zone had to be preserved so that people could dual-boot between their previous operating system and the new operating system.

    Another reason for keeping the BIOS clock on local time is to avoid confusing people who set their time via the BIOS itself. If you hit the magic key during the power-on self-test, the BIOS will go into its configuration mode, and one of the things you can configure here is the time. Imagine how confusing it would be if you set the time to 3pm, and then when you started Windows, the clock read 11am.

    "Stupid computer. Why did it even ask me to change the time if it's going to screw it up and make me change it a second time?"

    And if you explain to them, "No, you see, that time was UTC, not local time," the response is likely to be "What kind of totally propeller-headed nonsense is that? You're telling me that when the computer asks me what time it is, I have to tell it what time it is in London? (Except during the summer in the northern hemisphere, when I have to tell it what time it is in Reykjavik!?) Why do I have to remember my time zone and manually subtract four hours? Or is it five during the summer? Or maybe I have to add. Why do I even have to think about this? Stupid Microsoft. My watch says three o'clock. I type three o'clock. End of story."

    (What's more, some BIOSes have alarm clocks built in, where you can program them to have the computer turn itself on at a particular time. Do you want to have to convert all those times to UTC each time you want to set a wake-up call?)

  • The Old New Thing

    How to find the Internet Explorer binary

    • 45 Comments

    For some reason, some people go to enormous lengths to locate the Internet Explorer binary so they can launch it with some options.

    The way to do this is not to do it.

    If you just pass "IEXPLORE.EXE" to the ShellExecute function [link fixed 9:41am], it will go find Internet Explorer and run it.

    ShellExecute(NULL, "open", "iexplore.exe",
                 "http://www.microsoft.com", NULL,
                 SW_SHOWNORMAL);
    

    The ShellExecute function gets its hands dirty so you don't have to.

    (Note: If you just want to launch the URL generically, you should use

    ShellExecute(NULL, "open", "http://www.microsoft.com",
                 NULL, NULL, SW_SHOWNORMAL);
    

    so that the web page opens in the user's preferred web browser. Forcing Internet Explorer should be avoided under normal circumstances; we are forcing it here because the action is presumably being taken response to an explicit request to open the web page specifically in Internet Explorer.)

    If you want to get your hands dirty, you can of course do it yourself. It involves reading the specification from the other side, this time the specification on how to register your program's name and path ("Registering Application Path Information").

    The document describes how a program should enter its properties into the registry so that the shell can launch it. To read it backwards, then, interpret this as a list of properties you (the launcher) need to read from the registry.

    In this case, the way to run Internet Explorer (or any other program) the same way ShellExecute does is to look in HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\App Paths\IEXPLORE.EXE (substituting the name of the program if it's not Internet Explorer you're after). The default value is the full path to the program and the the "Path" value specifies a custom path that you should prepend to the environment before launching the target program.

    When you do this, don't forget to call the ExpandEnvironmentStrings function if the registry value's type is REG_EXPAND_SZ. (Lots of people forget about REG_EXPAND_SZ.)

    Of course, my opinion is that it's much easier just to let ShellExecute do the work for you.

  • The Old New Thing

    Reading a contract from the other side: Application publishers

    • 7 Comments

    In an earlier article, I gave an example of reading a contract from the other side. Here's another example of how you can read a specification and play the role of the operating system.

    I chose this particular example because somebody wanted to do this and didn't realize that everything they needed was already documented; they just needed to look at the documentation in a different light.

    The goal today is to mimic the list of programs that appears on the "Add New Programs" page of the Add or Remove Programs control panel. Note that in order for this list to appear at all, you need to be on a domain, and for the list to be nonempty, your domain controller needs to publish applications for domain members to install. Consequently, I suspect many of my readers won't get to see any interesting results from this exercise, but then again, the point of exercise is not the result but merely the doing of it.

    The documentation for the IAppPublisher interface describes how a publisher can register itself to appear in the list of programs that can be installed. All you have to do is read the documentation from the other side: Instead of reading documentation about a method and asking, "How would I implement this method?", read it and ask, "How would I call this method?"

    The documentation says that an app publisher registers its CLSID under the specified registry key. Therefore, if you want to find all the app publishers, you need to enumerate that key.

    #define REGSTR_PATH_PUBLISHERS  \
        L"Software\\Microsoft"      \
        L"\\Windows\\CurrentVersion\\App Management\\Publishers"
    
        HKEY hkPub;
        if (RegOpenKeyExW(HKEY_LOCAL_MACHINE, REGSTR_PATH_PUBLISHERS,
                          0, KEY_READ, &hkPub) == ERROR_SUCCESS) {
          WCHAR szKey[MAX_PATH];
          for (DWORD dwIndex = 0;
               RegEnumKeyW(hkPub, dwIndex, szKey, MAX_PATH) == ERROR_SUCCESS;
               dwIndex++) {
              ...
          }
          RegCloseKey(hkPub);
        }
    

    The documentation says that the subkeys have the CLSID in REG_SZ format, so that's what we read out.

            WCHAR szCLSID[MAX_PATH];
            LONG l = sizeof(szCLSID) - sizeof(WCHAR);
            if (RegQueryValueW(hkPub, szKey, szCLSID, &l) == ERROR_SUCCESS)
              szCLSID[l/sizeof(WCHAR)] = 0;
              CLSID clsid;
              if (SUCCEEDED(CLSIDFromString(szCLSID, &clsid))) {
                ...
              }
            }
    

    Notice the extra care we take to avoid the problem of registry strings that aren't null-terminated, as discussed in an earlier entry.

    The documentation quite explicitly states how this CLSID is used.

    Add/Remove Programs creates an instance of your object by calling CoCreateInstance for your object and requests the approprite [sic] IAppPublisher interface when the Add New Programs view is populated.

    Not much choice, now, is there. So we do what it says.

                IAppPublisher *ppub;
                if (SUCCEEDED(CoCreateInstance(clsid, NULL,
                                CLSCTX_ALL, IID_IAppPublisher,
                              (void**)&ppub))) {
                  ...
                  ppub->Release();
                }
    

    Okay, now that we have an app publisher, we can invoke the various methods on it to get information from that publisher. If we were more ambitious, we could ask for the categories but today we're just going to be happy with enumerating the programs so we can print their names.

                  IEnumPublishedApps *penum;
                  if (SUCCEEDED(ppub->EnumApps(NULL, &penum))) {
                    IPublishedApp *papp;
                    while (penum->Next(&papp) == S_OK) {
                      ...
                      papp->Release();
                    }
                    penum->Release();
                  }
    

    The enumerator gives us an application interface, and we can use that interface to get information about the application and print it out.

                      APPINFODATA info = { sizeof(info) };
                      info.dwMask = AIM_DISPLAYNAME;
                      if (SUCCEEDED(papp->GetAppInfo(&info)) &&
                          (info.dwMask & AIM_DISPLAYNAME)) {
                        wprintf(L"%ls\n", info.pszDisplayName);
                        CoTaskMemFree(info.pszDisplayName);
                      }
    

    We ask only for the display name, since that's all we're interested in today. In a more complicated program, we may ask for other data and would probably not release the IPublishedApp interface immediately, but rather hang onto it so we could invoke some other more interesting method like IPublishedApp::Install.

    (Note that we have to use the correct memory allocator to free the memory.)

    Okay, let's assemble all this into a simple console program.

    #include <windows.h>
    #include <ole2.h>
    #include <shappmgr.h>
    #include <stdio.h>
    
    #define REGSTR_PATH_PUBLISHERS  \
        L"Software\\Microsoft"      \
        L"\\Windows\\CurrentVersion\\App Management\\Publishers"
    
    int __cdecl main(int argc, char **argv)
    {
      if (SUCCEEDED(CoInitialize(NULL)) {
        HKEY hkPub;
        if (RegOpenKeyExW(HKEY_LOCAL_MACHINE, REGSTR_PATH_PUBLISHERS,
                          0, KEY_READ, &hkPub) == ERROR_SUCCESS) {
          WCHAR szKey[MAX_PATH];
          for (DWORD dwIndex = 0;
               RegEnumKeyW(hkPub, dwIndex, szKey, MAX_PATH) == ERROR_SUCCESS;
               dwIndex++) {
            WCHAR szCLSID[MAX_PATH];
            LONG l = sizeof(szCLSID) - sizeof(WCHAR);
            if (RegQueryValueW(hkPub, szKey, szCLSID, &l) == ERROR_SUCCESS)
              szCLSID[l/sizeof(WCHAR)] = 0;
              CLSID clsid;
              if (SUCCEEDED(CLSIDFromString(szCLSID, &clsid))) {
                IAppPublisher *ppub;
                if (SUCCEEDED(CoCreateInstance(clsid, NULL,
                                CLSCTX_ALL, IID_IAppPublisher,
                              (void**)&ppub))) {
                  IEnumPublishedApps *penum;
                  if (SUCCEEDED(ppub->EnumApps(NULL, &penum))) {
                    IPublishedApp *papp;
                    while (penum->Next(&papp) == S_OK) {
                      APPINFODATA info = { sizeof(info) };
                      info.dwMask = AIM_DISPLAYNAME;
                      if (SUCCEEDED(papp->GetAppInfo(&info)) &&
                          (info.dwMask & AIM_DISPLAYNAME)) {
                        wprintf(L"%ls\n", info.pszDisplayName);
                        CoTaskMemFree(info.pszDisplayName);
                      }
                      papp->Release();
                    }
                    penum->Release();
                  }
                  ppub->Release();
                }
              }
            }
          }
          RegCloseKey(hkPub);
        }
        CoUninitialize();
      }
      return 0;
    }
    

    When you run this program, a list of all programs published by your domain controller should go scrolling past. (As I noted at the beginning of this entry, you won't see much if your computer is not on a domain or if your domain controller doesn't publish any programs.)

    Yes, this program is not very pretty, because prettiness was not my goal. In real life, a lot of the mess would be moved out into helper functions, and you can clean it up even more by using a smart pointer library, but the goal here was not to write a pretty program; it was to show how something could be done by reading the specification from the other side.

    (Why don't I use a smart pointer library? Because I try to write in "raw" C++ in order to avoid arguments about whose smart pointer library is best, or why smart pointers are evil... It's easy to convert "raw" C++ to use a smart pointer library, but it's harder to convert from one smart pointer library to another.)

  • The Old New Thing

    Importance of alignment even on x86 machines, part 2

    • 7 Comments

    The various Interlocked functions (InterlockedIncrement, and so on) require that the variable being updated be properly aligned, even on x86, a platform where the CPU silently fixes unaligned memory access invisibly.

    If you pass an unaligned pointer to one of the Interlocked functions, the operation will still succeed, but the result won't be atomic. Another processor may see a partially-completed update.

    This is a particularly insidious bug since it happens only on multiprocessor machines under very tight timing conditions. You will be hard-pressed to reproduce this in the laboratory.

    (A commenter stole my thunder and remarked on it yesterday.)

    Moral of the story: Same as yesterday. Mind your alignment.

  • The Old New Thing

    The sociology of the mobile phone

    • 9 Comments

    It has become obvious by now that the mobile phone has changed the way people interact. These two papers were forwaded to me by a colleague, whose summary I am shamelessly lifting from heavily.

    First is a short paper titled Exploring the implications for social identity of the new sociology of the mobile phone.

    The much more fascinating (and much longer) one is The Effects of Mobile Telephones on Social and Individual Life [PDF]. Read about Flight, Suspension and Persistence—the three ways people deal with incoming calls. Learn how to tell an Innie from an Outie. Set up an Approximeeting. Are you a swift, an owl, a dove, a sparrow, a starling, or a peacock? Pick up some UK mobile phone lingo:

    • "Gooseberry call": calling someone during their date to get a progress report.
    • "Shagbile": the mobile phone your partner doesn't know about, used for affairs.
    • "Sad": derogatory term for people who show off their phones. Enormous ritualization surrounds this.

    "Shagbile" is the best. So Austin Powers.

  • The Old New Thing

    Importance of alignment even on x86 machines

    • 17 Comments

    Sometimes unaligned memory access will hang the machine.

    Some video cards do not let you access all the video memory at one go. Instead, you are given a window into which you can select which subset of video memory ("bank") you want to see. For example, the EGA video card had 256K of memory, split into four 64K banks. If you wanted to access memory in the first 64K, you had to select bank zero into the window, but if you wanted to access memory in the second 64K, then you had to select bank one.

    Bank-switching makes memory access much more complicated, For example, if you want to copy a block of memory into bank-switched memory, you have to check when you are going to cross a bank boundary and break the copy up into pieces. If you are doing something that requires non-sequential access (say, drawing a diagonal line), you have to check when your line is going to cross into another bank.

    To simplify matters, Windows 95 had a driver called VFLATD that made bank-switched memory look flat to the rest of the system. Flattening the bank-switched memory model was also crucial for DirectDraw support; in particular, the IDirectDrawSurface::Lock method gave you direct access to a (seemingly) flat expanse of video memory. For example, if the application wanted to see a 256K surface and accessed memory in the first 64K of memory, the VFLATD driver would select bank zero and map the 64K physical memory window into the first 64K of the virtual 256K memory window.

    This worked great as long as everybody uses only aligned memory accesses. But if you access unaligned memory, you can send VFLATD into an infinite loop and hang the machine.

    Suppose you make an unaligned memory access that straddles two banks. This memory access can never be satisfied. A page fault is taken on the lower portion of the unaligned access, and VFLATD maps the lower bank into memory. Then a page fault is taken on the higher portion of the unaligned access, and VFLATD now has to map the upper bank; this unmaps the lower bank, since the video card is bank-switched and only one bank can be mapped ata time. Now a page fault is taken on the lower portion, and the infinite loop continues.

    Moral of the story: Keep those memory accesses aligned, even on the x86, which most people would consider to be one where it is "safe" to violate alignment rules.

    Next time, another example of how misaligned data access can create bugs x86.

  • The Old New Thing

    Why do some structures end with an array of size 1?

    • 41 Comments

    Some Windows structures are variable-sized, beginning with a fixed header, followed by a variable-sized array. When these structures are declared, they often declare an array of size 1 where the variable-sized array should be. For example:

    typedef struct _TOKEN_GROUPS {
        DWORD GroupCount;
        SID_AND_ATTRIBUTES Groups[ANYSIZE_ARRAY];
    } TOKEN_GROUPS, *PTOKEN_GROUPS;
    

    If you look in the header files, you'll see that ANYSIZE_ARRAY is #define'd to 1, so this declares a structure with a trailing array of size one.

    With this declaration, you would allocate memory for one such variable-sized TOKEN_GROUPS structure like this:

    PTOKEN_GROUPS TokenGroups =
       malloc(FIELD_OFFSET(TOKEN_GROUPS, Groups[NumberOfGroups]));
    
    and you would initialize the structure like this:
    TokenGroups->GroupCount = NumberOfGroups;
    for (DWORD Index = 0; Index = NumberOfGroups; Index++) {
      TokenGroups->Groups[Index] = ...;
    }
    

    Many people think it should have been declared like this:

    typedef struct _TOKEN_GROUPS {
        DWORD GroupCount;
    } TOKEN_GROUPS, *PTOKEN_GROUPS;
    

    (In this article, code that is wrong or hypothetical will be italicized.)

    The code that does the allocation would then go like this:

    PTOKEN_GROUPS TokenGroups =
       malloc(sizeof(TOKEN_GROUPS) +
              NumberOfGroups * sizeof(SID_AND_ATTRIBUTES));
    

    This alternative has two disadvantages, one cosmetic and one fatal.

    First, the cosmetic disadvantage: It makes it harder to access the variable-sized data. Initializing the TOKEN_GROUPS just allocated would go like this:

    TokenGroups->GroupCount = NumberOfGroups;
    for (DWORD Index = 0; Index = NumberOfGroups; Index++) {
      ((SID_AND_ATTRIBUTES *)(TokenGroups + 1))[Index] = ...;
    }
    

    The real disadvantage is fatal. The above code crashes on 64-bit Windows. The SID_AND_ATTRIBUTES structure looks like this:

    typedef struct _SID_AND_ATTRIBUTES {
        PSID Sid;
        DWORD Attributes;
        } SID_AND_ATTRIBUTES, * PSID_AND_ATTRIBUTES;
    

    Observe that the first member of this structure is a pointer, PSID. The SID_AND_ATTRIBUTES structure requires pointer alignment, which on 64-bit Windows is 8-byte alignment. On the other hand, the proposed TOKEN_GROUPS structure consists of just a DWORD and therefore requires only 4-byte alignment. sizeof(TOKEN_GROUPS) is four.

    I hope you see where this is going.

    Under the proposed structure definition, the array of SID_AND_ATTRIBUTES structures will not be placed on an 8-byte boundary but only on a 4-byte boundary. The necessary padding between the GroupCount and the first SID_AND_ATTRIBUTES is missing. The attempt to access the elements of the array will crash with a STATUS_DATATYPE_MISALIGNMENT exception.

    Okay, you may say, then why not use a zero-length array instead of a 1-length array?

    Because time travel has yet to be perfected.

    Zero-length arrays did not become legal Standard C until 1999. Since Windows was around long before then, it could not take advantage of that functionality in the C language.

  • The Old New Thing

    Why can't you treat a FILETIME as an __int64?

    • 27 Comments

    The FILETIME structure represents a 64-bit value in two parts:

    typedef struct _FILETIME {
      DWORD dwLowDateTime;
      DWORD dwHighDateTime;
    } FILETIME, *PFILETIME;
    

    You may be tempted to take the entire FILETIME structure and access it directly as if it were an __int64. After all, its memory layout exactly matches that of a 64-bit (little-endian) integer. Some people have written sample code that does exactly this:

    pi = (__int64*)&ft; // WRONG
    (*pi) += (__int64)num*datepart; // WRONG
    

    Why is this wrong?

    Alignment.

    Since a FILETIME is a structure containing two DWORDs, it requires only 4-byte alignment, since that is sufficient to put each DWORD on a valid DWORD boundary. There is no need for the first DWORD to reside on an 8-byte boundary. And in fact, you've probably already used a structure where it doesn't: The WIN32_FIND_DATA structure.

    typedef struct _WIN32_FIND_DATA {
        DWORD dwFileAttributes;
        FILETIME ftCreationTime;
        FILETIME ftLastAccessTime;
        FILETIME ftLastWriteTime;
        DWORD nFileSizeHigh;
        DWORD nFileSizeLow;
        DWORD dwReserved0;
        DWORD dwReserved1;
        TCHAR  cFileName[ MAX_PATH ];
        TCHAR  cAlternateFileName[ 14 ];
    } WIN32_FIND_DATA, *PWIN32_FIND_DATA, *LPWIN32_FIND_DATA;
    

    Observe that the three FILETIME structures appear at offsets 4, 12, and 20 from the beginning of the structure. They have been thrown off 8-byte alignment by the dwFileAttributes member.

    Casting a FILETIME to an __int64 therefore can (and in the WIN32_FIND_DATA case, will) create a misaligned pointer. Accessing a misaligned pointer will raise a STATUS_DATATYPE_MISALIGNMENT exception on architectures which require alignment.

    Even if you are on a forgiving platform that performs automatic alignment fixups, you can still run into trouble. More on this and other consequences of alignment in the next few entries.

    Exercise: Why are the LARGE_INTEGER and ULARGE_INTEGER structures not affected?

  • The Old New Thing

    Beware of non-null-terminated registry strings

    • 25 Comments

    Even though a value is stored in the registry as REG_SZ, this doesn't mean that the value actually ends with a proper null terminator. At the bottom, the registry is just a hierarchically-organized name/value database.

    And you can lie and get away with it.

    Lots of people lie about their registry data. You'll find lots of things that should be REG_DWORD stored as a four-byte REG_BINARY. (This is in part a holdover from Windows 95's registry, which didn't support REG_DWORD.)

    One of the most insidious lies is to lie about the length of a string you're writing to the registry. Consider the following program:

    #include <windows.h>
    #include <stdio.h>
    
    int __cdecl main(int argc, char **argv)
    {
        RegSetValueExW(HKEY_CURRENT_USER, L"Scratch",
                       0, REG_SZ, (BYTE*)L"12", 2);
    
        DWORD cb = 0;
        RegQueryValueExW(HKEY_CURRENT_USER, L"Scratch",
                         NULL, NULL, NULL, &cb);
        printf("Size is %d bytes\n", cb);
    
        WCHAR sz[2];
        sz[0] = 0xFFFF;
        sz[1] = 0xFFFF;
        cb = sizeof(sz[0]);
        DWORD dwRc = RegQueryValueExW(HKEY_CURRENT_USER, L"Scratch",
                                      NULL, NULL, (BYTE*)sz, &cb);
        printf("RegQueryValueExW requesting %d bytes => %d\n",
               sizeof(sz), dwRc);
        printf("%d bytes required\n", cb);
        if (dwRc == ERROR_SUCCESS) {
            printf("sz[0] = %d\n", sz[0]);
            printf("sz[1] = %d\n", sz[1]);
        }
    
        RegDeleteValueW(HKEY_CURRENT_USER, L"Scratch");
    
        return 0;
    }
    

    If you run this program, you get this:

    Size is 2 bytes
    RegQueryValueExW requesting 4 bytes => 0
    2 bytes required
    sz[0] = 49
    sz[1] = 65535
    

    What happened?

    First, observe that the call to RegSetValueExW lies about the length of the string, claiming that it is two bytes long when in fact it is six! (Two WCHARs plus a terminator.)

    The registry dutifully records this lie and reports it back to subsequent callers.

    The first call to RegQueryValueExW asks how big the string is, and the registry reports the value 2, since that's the value it was given when the value was originally stored.

    To show that there really is no null terminator, we ask the registry to read those two bytes of data into our buffer, pre-filling the buffer with sentinel values so we can see what got updated and what didn't.

    Lo and behold, the values were read from the registry and only two bytes were read. sz[0] contains the character '1', and sz[1] remains uninitialized.

    This has security implications.

    If your program assumes that strings in the registry are always null-terminated, then you can be tricked into a buffer overflow if you happen across a non-null-terminated string. (For example, if you use strcpy to copy it around.)

    (Note: I'm not going to get into whether it should have been possible to get into this state in the first place. I didn't design the registry. Arguing about the past isn't going to change the present, and the present is that this is how it works so you'd better be ready for it.)

    Exercise: Change the last parameter of RegSetValueExW to 3 and run the program again. Explain the results and discuss its consequences.

Page 382 of 432 (4,317 items) «380381382383384»