January, 2004

  • The Old New Thing

    "Section 419" scammers arrested in Netherlands; Danish flag flies proudly

    • 9 Comments
    Dutch police have arrested 52 people suspected of defrauding gullible Internet users in one of the largest busts of the infamous "Nigerian e-mail" scam.
    Hooray for the Dutch police. Their next target: Web sites that illustrate a Dutch article with the Danish flag.

    (I must sheepishly admit that I too mistakenly identified the home of Ikea as Denmark rather than the Netherlands.)

  • The Old New Thing

    The format of string resources

    • 34 Comments
    Unlike the other resource formats, where the resource identifier is the same as the value listed in the *.rc file, string resources are packaged in "bundles". There is a rather terse description of this in Knowledge Base article Q196774. Today we're going to expand that terse description into actual code.

    The strings listed in the *.rc file are grouped together in bundles of sixteen. So the first bundle contains strings 0 through 15, the second bundle contains strings 16 through 31, and so on. In general, bundle N contains strings (N-1)*16 through (N-1)*16+15.

    The strings in each bundle are stored as counted UNICODE strings, not null-terminated strings. If there are gaps in the numbering, null strings are used. So for example if your string table had only strings 16 and 31, there would be one bundle (number 2), which consists of string 16, fourteen null strings, then string 31.

    (Note that this means there is no way to tell the difference between "string 20 is a string that has length zero" and "string 20 doesn't exist".)

    The LoadString function is rather limiting in a few ways:

    • You can't pass a language ID. If your resources are multilingual, you can't load strings from a nondefault language.
    • You can't query the length of a resource string.

    Let's write some functions that remove these limitations.

    LPCWSTR FindStringResourceEx(HINSTANCE hinst,
     UINT uId, UINT langId)
    {
     // Convert the string ID into a bundle number
     LPCWSTR pwsz = NULL;
     HRSRC hrsrc = FindResourceEx(hinst, RT_STRING,
                         MAKEINTRESOURCE(uId / 16 + 1),
                         langId);
     if (hrsrc) {
      HGLOBAL hglob = LoadResource(hinst, hrsrc);
      if (hglob) {
       pwsz = reinterpret_cast<LPCWSTR>
                  (LockResource(hglob));
       if (pwsz) {
        // okay now walk the string table
        for (int i = 0; i < uId & 15; i++) {
         pwsz += 1 + (UINT)*pwsz;
        }
        UnlockResource(pwsz);
       }
       FreeResource(hglob);
      }
     }
     return pwsz;
    }
    

    After converting the string ID into a bundle number, we find the bundle, load it, and lock it. (That's an awful lot of paperwork just to access a resource. It's a throwback to the Windows 3.1 way of managing resources; more on that in a future entry.)

    We then walk through the table skipping over the desired number of strings until we find the one we want. The first WCHAR in each string entry is the length of the string, so adding 1 skips over the count and adding the count skips over the string.

    When we finish walking, pwsz is left pointing to the counted string.

    With this basic function we can create fancier functions.

    The function FindStringResource is a simple wrapper that searches for the string in the default thread language.

    LPCWSTR FindStringResource(HINSTANCE hinst, UINT uId)
    {
     return FindStringResourceEx(hinst, uId,
         MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL));
    }
    

    The function GetResourceStringLengthEx returns the length of the corresponding string, including the null terminator.

    UINT GetStringResourceLengthEx(HINSTANCE hinst,
     UINT uId, UINT langId)
    {
     LPCWSTR pwsz = FindStringResourceEx
                           (hinst, uId, langId);
     return 1 + (pwsz ? *pwsz : 0);
    }
    

    And the function AllocStringFromResourceEx loads the entire string resource into a heap-allocated memory block.

    LPWSTR AllocStringFromResourceEx(HINSTANCE hinst,
     UINT uId, UINT langId)
    {
     LPCWSTR pwszRes = FindStringResourceEx
                           (hinst, uId, langId);
     if (!pwszRes) pwszRes = L"";
     LPWSTR pwsz = new WCHAR[(UINT)*pwszRes+1];
     if (pwsz) {
       pwsz[(UINT)*pwszRes] = L'\0';
       CopyMemory(pwsz, pwszRes+1,
                  *pwszRes * sizeof(WCHAR));
     }
     return pwsz;
    }
    

    (Writing the non-Ex functions GetStringResourceLength and AllocStringFromResource is left as an exercise.)

    Note that we must explicitly null-terminate the string since the string in the resource is not null-terminated. Note also that the string returned by AllocStringFromResourceEx must be freed with delete[]. For example:

    LPWSTR pwsz = AllocStringFromResource(hinst, uId);
    if (pwsz) {
      ... use pwsz ...
      delete[] pwsz;
    }
    

    Mismatching vector "new[]" and scalar "delete" is an error I'll talk about in a future entry.

    Exercise: Discuss how the /n flag to rc.exe affects these functions.
  • The Old New Thing

    How do we decide what features make it into a product?

    • 0 Comments
    David Lemson has an excellent article titled How do we decide what features make it into Exchange?. Although he's talking about Exchange specifically, the general principles apply to many products.
  • The Old New Thing

    Integer overflow in the new[] operator

    • 21 Comments
    Integer overflows are becoming a new security attack vector. Mike Howard's article discusses some of the ways you can protect yourself against integer overflow attacks.

    One attack vector he neglects to mention is integer overflow in the new[] operator. This operator performs an implicit multiplication that is unchecked:

    int *allocate_integers(int howmany)
    {
        return new int[howmany];
    }
    

    If you study the code generation for this, it comes out to

      mov  eax, [esp+4] ; eax = howmany
      shl  eax, 2       ; eax = howmany * sizeof(int)
      push eax
      call operator new ; allocate that many bytes
      pop  ecx
      retd 4
    

    Notice that the multiplication by sizeof(int) is not checked for overflow. Somebody can trick you into under-allocating memory by passing a value like howmany = 0x40000001. For larger structures, multiplication overflow happens sooner.

    Let's look at a slightly longer example:

    class MyClass {
    public:
      MyClass(); // constructor
      int stuff[256];
    };
    
    MyClass *allocate_myclass(int howmany)
    {
      return new MyClass[howmany];
    }
    

    This class also contains a constructor, so allocating an array of them involves two steps: allocate the memory, then construct each object. The allocate_myclass function compiles to this:

      mov  eax, [esp+4] ; howmany
      shl  eax, 10      ; howmany * sizeof(MyClass)
      push esi
      push eax
      call operator new ; allocate that many bytes
      mov  esi, eax
      test esi, esi
      pop  ecx
      je   fail
      push OFFSET MyClass::MyClass
      push [esp+12]     ; howmany
      push 1024         ; sizeof(MyClass)
      push esi          ; memory block
      call `vector constructor iterator`
      mov  eax, esi
      jmp  loop
    fail:
      xor  eax, eax
    done:
      pop  esi
      retd 4
    

    This function does an unchecked multiplication of the size, then tries to allocate that many bytes, then tells the vector constructor iterator to call the constructor (MyClass::MyClass) that many times.

    If somebody tricks you into calling allocate_myclass(0x200001), the multiplication overflows and only 1024 bytes are allocated. This allocation succeeds, and then the vector constructor tries to initialize 0x200001 of those items, even though in reality only one of them got allocated. So you walk off the end of the memory block and start corrupting memory.

    That's a bad thing.

    To protect against this, you can wrap an integer overflow check around the array allocation.

    template<typename T>
    T* NewArray(size_t n)
    {
      if (n <= (size_t)-1 / sizeof(T))
        return new T[n];
    
      // n is too large - act as if we
      // ran out of memory
      return NULL;
    }
    

    Note: If you use a throwing "new", then replace the "return NULL" with an appropriate throw.

    You can now use this template to allocate arrays in an overflow-safe manner.

    MyClass *allocate_myclass(int howmany)
    {
      return NewArray<MyClass>(howmany);
    }
    

    This generates the following code:

      push edi
      mov  edi, [esp+8] ; howmany
      cmp  edi, 4194303 ; overflow?
      ja   overflow
    
      mov  eax, edi
      shl  eax, 10
      push esi
      push eax
      call operator new
      mov  esi, eax
      test esi, esi
      pop  ecx
      je   failed
      push OFFSET MyClass::MyClass
      push edi
      push 1024
      push esi
      call    
      call `vector constructor iterator`
      mov  eax, esi
      jmp  done
    
    failed:
      xor  eax, eax
    done:
      pop  esi
      jmp  exit
    
    overflow:
      xor     eax, eax
    exit:
      pop  edi
      retd 4
    

    Notice the new code that checks for a possible integer multiplication overflow.

    But how could you get tricked into an overflow situation?

    The most common way of doing this is by reading the value out of a file or some other storage location. For example, if your code is parsing a file that has a section whose format is "length followed by data", somebody could intentionally put an overflow-inducing value into the "length" field, then get somebody else to try to load the file.

    This is particularly dangerous if the filetype is something that is generally considered "not dangerous", like a JPG.

  • The Old New Thing

    Ikea walk-through

    • 29 Comments
    Jeff Davis tipped me off to this Ikea walk-through. Frustratingly, the walkthrough doesn't include any cheat codes.

    Even though Ikea was founded by a Swede, its company colors match the Swedish national colors, all its product names are Swedish, and it is clearly associated with Sweden in the minds of everyone, it is in fact headquartered in Denmark. Probably for tax reasons.

    The name of the plastic stool Förby has provided me much entertainment. At first, it was amusingly similar to the name of the annoying toy Furby. And it turns out it's also amusingly similar to the Swedish word "förbi" which means "gone past". (German "vorbei".)

    During an evening of sore hands trying to drive screws into wood using a tiny little hex wrench, some friends and I came up with new Ikea product names. Here are a few:

    • Frusträt
    • Flims
    • Instabil
    • Brøkken (okay fine this is Norwegian, not Swedish; I hadn't started studying Swedish yet)
    • Kräp

    And, of course, the complimentary twine for tying your packages to the top of your car:

    • Strink

    Some of these might even be better than the actual name they came up with for a children's bed.

  • The Old New Thing

    Another reason not to do anything scary in your DllMain: Inadvertent deadlock

    • 17 Comments

    Your DllMain function runs inside the loader lock, one of the few times the OS lets you run code while one of its internal locks is held. This means that you must be extra careful not to violate a lock hierarchy in your DllMain; otherwise, you are asking for a deadlock.

    (You do have a lock hierarchy in your DLL, right?)

    The loader lock is taken by any function that needs to access the list of DLLs loaded into the process. This includes functions like GetModuleHandle and GetModuleFileName. If your DllMain enters a critical section or waits on a synchronization object, and that critical section or synchronization object is owned by some code that is in turn waiting for the loader lock, you just created a deadlock:

    // global variable
    CRITICAL_SECTION g_csGlobal;
    
    // some code somewhere
    EnterCriticalSection(&g_csGlobal);
    ... GetModuleFileName(MyInstance, ..);
    LeaveCriticalSection(&g_csGlobal);
    
    BOOL WINAPI
    DllMain(HINSTANCE hinstDLL, DWORD fdwReason,
            LPVOID lpvReserved)
    {
      switch (fdwReason) {
      ...
      case DLL_THREAD_DETACH:
       EnterCriticalSection(&g_csGlobal);
       ...
      }
      ...
    }
    

    Now imagine that some thread is happily executing the first code fragment and enters g_csGlobal, then gets pre-empty. During this time, another thread exits. This enters the loader lock and sends out DLL_THREAD_DETACH messages while the loader lock is still held.

    You receive the DLL_THREAD_DETACH and attempt to enter your DLL's g_csGlobal. This blocks on the first thread, who owns the critical section. That thread then resumes execution and calls GetModuleFileName. This function requires the loader lock (since it's accessing the list of DLLs loaded into the process), so it blocks, since the loader lock is owned by somebody else.

    Now you have a deadlock:

    • g_cs owned by first thread, waiting on loader lock.
    • Loader lock owned by second thread, waiting on g_cs.

    I have seen this happen. It's not pretty.

    Moral of the story: Respect the loader lock. Include it in your lock hierarchy rules if you take any locks in your DllMain.
  • The Old New Thing

    Passenger announcements in the airport

    • 4 Comments
    While in Seattle-Tacoma International Airport yesterday, waiting for my flight to eventually be cancelled due to weather, then waiting for a replacement itinerary (um, the weather is the same at the destination; doesn't matter which plane you take), then waiting for the replacement to be cancelled also (wow imagine that), I heard an announcement on the public address system: "Will passenger Michael Kinsley please pick up a white courtesy phone."

    Probably the closest I will get to Michael Kinsley.

    Last year, while waiting in I think it was LAX or possibly DEN, I heard the announcement "Will passenger Larry Ellison please pick up a white courtesy phone."
  • The Old New Thing

    Some reasons not to do anything scary in your DllMain

    • 24 Comments
    As everybody knows by now, you're not supposed to do anything even remotely interesting in your DllMain function. Oleg Lvovitch has written two very good articles about this, one about how things work, and one about what goes wrong when they don't work.

    Here's another reason not to do anything remotely interesting in your DllMain: It's common to load a library without actual intent to invoke its full functionality. For example, somebody might load your library like this:

    // error checking deleted for expository purposes
    hinst = LoadLibrary(you);
    hicon = LoadIcon(you, MAKEINTRESOURCE(5));
    FreeLibrary(hinst);
    

    This code just wants your icon. It would be very surprised (and perhaps even upset) if your DLL did something heavy like starting up a timer or a thread.

    (Yes, this could be avoided by using LoadLibraryEx and LOAD_LIBRARY_AS_DATAFILE, but that's not my point.)

    Another case where your library gets loaded even though no code is going to be run is when it gets tugged along as a dependency for some other DLL. Suppose "middle" is the name of some intermediate DLL that is linked to your DLL.

    hinst = LoadLibrary(middle);
    pfn = GetProcAddress(hinst, "SomeFunction");
    pfn(...);
    FreeLibrary(hinst);
    

    When "middle" is loaded, your DLL will get loaded and initialized, too. So your initialization runs even if "SomeFunction" doesn't use your DLL.

    This "intermediate DLL loaded for a brief time" scenario is actually quite common. For example, if somebody does "Regsvr32 middle.dll", that will load the middle DLL to call its DllRegisterServer function, which typically doesn't do much other than install some registry keys. It almost certainly doesn't call into your helper DLL.

    Another example is the opening of the Control Panel folder. The Control Panel folder loads every *.cpl file so it can call its CplApplet function to determine what icon to display. Again, this typically will not call into your helper DLL.

    And under no circumstances should you create any objects with thread affinity in your DLL_PROCESS_ATTACH handler. You have no control over which thread will send the DLL_PROCESS_ATTACH message, nor which thread will send the DLL_PROCESS_DETACH message. The thread that sends the DLL_PROCESS_ATTACH message might terminate immediately after it loads your DLL. Any object with thread-affinity will then stop working since its owner thread is gone.

    And even if that thread survives, there is no guarantee that the thread that calls FreeLibrary is the same one that called LoadLibrary. So you can't clean up those objects with thread affinity in DLL_PROCESS_DETACH since you're on the wrong thread.

    And absolutely under no circumstances should you be doing anything as crazy as creating a window inside your DLL_PROCESS_ATTACH. In addition to the thread affinity issues, there's the problem of global hooks. Hooks running inside the loader lock are a recipe for disaster. Don't be surprised if your machine deadlocks.

    Even more examples to come tomorrow.
  • The Old New Thing

    Undermining your own proclamation

    • 4 Comments
    I'm pulling for the Mars rovers as much as the next geek, but you still have to scratch your head at the following statement:
    Charles Elachi, the JPL director, said: "I am completely confident, without any hesitation, that I think we will get that rover back to full operation."
    So he's absolutely sure that he "thinks" something.
  • The Old New Thing

    Blog going on autopilot for a while

    • 3 Comments
    I will be out of town for a few weeks, so I have set my blog on autopilot. There will still be an article every weekday at 7am Pacific time (assuming the autopilot machine doesn't suffer a power outage or something), but I won't be around (much) to respond to comments.
Page 1 of 5 (43 items) 12345