May, 2006

  • The Old New Thing

    That mysterious J

    • 33 Comments

    In e-mail from Microsoft employees, you may find a stray J like this one at the end of a message from Rico Mariani. Some of you might see it; others might not. What's the deal with the J?

    The J started out its life as a smiley-face. The WingDings font puts a smiley face where the letter J goes. Here, let me try: <FONT FACE=WingDings>J</FONT> results in J. As the message travels from machine to machine, the font formatting may get lost or mangled, resulting in the letter J appearing when a smiley face was intended. (Note that this is not the same as the smiling face incorporated into Unicode as U+263A, which looks like this: ☺. Some of you might see it; others might not.)

    I recall a story (possibly apocryphal) of somebody who regularly exchanged a lot of e-mail with Microsoft employees and who as a result started signing their own messages with a J, figuring this was some sort of Microsoft slang. The Microsoft employees who got the J-messages scratched their heads until they were able to figure out how their correspondent arrived at this fabulous deduction.

    And now, the mysterious J has come full circle, because some people use it ironically, intentionally just writing a J without setting the font, in the same way people making fun of "leet" writing may "accidentally" type "1"s (or even more absurdly, the word "one") into a row of exclamation points.

  • The Old New Thing

    Doing quick arithmetic from the command prompt

    • 64 Comments

    The command processor CMD.EXE comes with a mini-calculator that can perform simple arithmetic on 32-bit signed integers:

    C:\>set /a 2+2
    4
    C:\>set /a 2*(9/2)
    8
    C:\>set /a (2*9)/2
    9
    C:\>set /a "31>>2"
    7
    

    Note that we had to quote the shift operator since it would otherwise be misinterpreted as a "redirect stdout and append" operator.

    For more information, type set /? at the command prompt.

  • The Old New Thing

    How do I write a regular expression that matches an IPv4 dotted address?

    • 54 Comments

    Writing a regular expression that matches an IPv4 dotted address is either easy or hard, depending on how good a job you want to do. In fact, to make things easier, let's match only the decimal dotted notation, leaving out the hexadecimal variant, as well as the non-dotted variants.

    For the purpose of this discussion, I'll restrict myself to the common subset of the regular expression languages shared by perl, JScript, and the .NET Framework, and I'll assume ECMA mode, wherein \d matches only the characters 0 through 9. (By default, in the .NET Framework, \d matches any decimal digit, not just 0 through 9.)

    The easiest version is just to take any string of four decimal numbers separated by periods.

    /^\d+\.\d+\.\d+\.\d+$/
    

    This is nice as far as it goes, but it erroneously accepts strings like "448.90210.0.65535". A proper decimal dotted address has no value larger than 255. But writing a regular expression that matches the integers 0 through 255 is hard work because regular expressions don't understand arithmetic; they operate purely textually. Therefore, you have to describe the integers 0 through 255 in purely textual means.

    • Any single digit is valid (representing 0 through 9).
    • Any nonzero digit followed by another digit is valid (representing 10 through 99).
    • A "1" followed by two digits is valid (100 through 199).
    • A "2" followed by "0" through "4" followed by another digit is valid (200 through 249).
    • A "25" followed by "0" through "5" is valid (250 throuth 255).

    Given this textual breakdown of the integers 0 through 255, your first try would be something like this:

    /^\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]$/
    

    This can be shrunk a bit by recognizing that the first two rules above could be combined into

    • Any digit, optionally preceded by a nonzero digit, is valid.

    yielding

    /^[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]$/
    

    Now we just have to do this four times with periods in between:

    /^([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$/
    

    Congratulations, we have just taken a simple description of the dotted decimal notation in words and converted into a monstrous regular expression that is basically unreadable. Imagine you were maintaining a program and stumbled across this regular expression. How long would it take you to figure out what it did?

    Oh, and it might not be right yet, because some parsers accept leading zeroes in front of each decimal value without affecting it. (For example, 127.0.0.001 is the same as 127.0.0.1. On the other hand, some parsers treat a leading zero as an octal prefix.) Updating our regular expression to accept leading decimal zeroes means that we now have

    /^0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$/
    

    This is why I both love and hate regular expressions. They are a great way to express simple patterns. And they are a horrific way to express complicated ones. Regular expressions are probably the world's most popular write-only language.

    Aha, but you see, all this time diving into regular expressions was a mistake. Because we failed to figure out what the actual problem was. This was a case of somebody "solving" half of their problem and then asking for help with the other half: "I have a string and I want to check whether it is a dotted decimal IPv4 address. I know, I'll write a regular expression! Hey, can anybody help me write this regular expression?"

    The real problem was not "How do I write a regular expression to recognize a dotted decimal IPv4 address." The real problem was simply "How do I recognize a dotted decimal IPv4 address." And with this broader goal in mind, you recognize that limiting yourself to a regular expression only made the problem harder.

    function isDottedIPv4(s)
    {
     var match = s.match(/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/);
     return match != null &&
            match[1] <= 255 && match[2] <= 255 &&
            match[3] <= 255 && match[4] <= 255;
    }
    
    WScript.StdOut.WriteLine(isDottedIPv4("127.0.0.001"));
    WScript.StdOut.WriteLine(isDottedIPv4("448.90210.0.65535"));
    WScript.StdOut.WriteLine(isDottedIPv4("microsoft.com"));
    

    And this was just a simple dotted decimal IPv4 address. Woe unto you if you decide you want to parse e-mail addresses.

    Don't make regular expressions do what they're not good at. If you want to match a simple pattern, then match a simple pattern. If you want to do math, then do math. As commenter Maurits put it, "The trick is not to spend time developing a combination hammer/screwdriver, but just use a hammer and a screwdriver.

  • The Old New Thing

    Beware the C++ implicit conversion

    • 47 Comments

    Today's topic was inspired by a question from a customer:

    I am working on a stack overflow bug. To reduce the size of the stack frame, I removed as many local variables as I could, but there's still a a lot of stack space that I can't account for. What else lives on the stack aside from local variables, parameters, saved registers, and the return address?

    Well, there's also structured exception handling information, but that's typically not too much and therefore wouldn't be the source of "a lot" of mysterious stack usage.

    My guess is that the code is generating lots of large C++ temporaries. Consider the following program fragment:

    class BigBuffer
    {
    public:
     BigBuffer(int initialValue)
       { memset(buffer, initialValue, sizeof(buffer)); }
    private:
     char buffer[65536];
    };
    
    extern void Foo(const BigBuffer& o);
    
    void oops()
    {
     Foo(3);
    }
    

    "How does this code even compile? The function Foo wants a BigBuffer, not an integer!" Yet compile it does.

    That's because the compiler is using the BigBuffer constructor as a converter. In other words, the compiler inserted the following temporary variable:

    void oops()
    {
     BigBuffer temp(3);
     Foo(temp);
    }
    

    It did this because a constructor that takes exactly one argument serves two purposes: It can be used as a traditional constructor (as we saw with BigBuffer temp(3)) or it can be used to provide an implicit conversion from the argument type to the constructed type. In this case, the BigBuffer(int) constructor is being used as a conversion from int to BigBuffer.

    To prevent this from happening, use the explicit keyword:

    class BigBuffer
    {
    public:
     explicit BigBuffer(int initialValue)
       { memset(buffer, initialValue, sizeof(buffer)); }
    private:
     char buffer[65536];
    };
    

    With this change, the call to Foo(3) raises a compiler error:

    sample.cpp: error C2664: 'Foo' : cannot convert parameter 1 from
         'int' to 'const BigBuffer &'
         Reason: cannot convert from 'int' to 'const BigBuffer'
         Constructor for class 'BigBuffer' is declared 'explicit'
    
  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway

    • 47 Comments

    Not every code injection bug is a security hole.

    Yes, a code injection bug is a serious one indeed. But it doesn't become a security hole until it actually allows someone to do something they normally wouldn't be able to.

    For example, suppose there's a bug where if you type a really long file name into a particular edit control and click "Save", the program overflows a buffer. With enough work, you might be able to turn this into a code injection bug, by entering a carefully-crafted file name. But that's not a security hole yet. All you've found so far is a serious bug. (Yes, it's odd that I'm underplaying a serious bug, but only because I'm comparing it to a security hole.)

    Look at what you were able to do: You were able to get a program to execute code of your choosing. Big deal. You can already do that without having to go through all this effort. If you wanted to execute code of your own choosing, then you can just put it in a program and run it!

    The hard way

    1. Write the code that you want to inject, compile it to native machine code.
    2. Analyze the failure, develop a special string whose binary representation results in the overwriting of a return address, choosing the value so that it points back into the stack.
    3. Write an encoder that takes the code you wrote in step 1 and converts it into a string with no embedded zeros. (Because an embedded zero will be treated as a string terminator.)
    4. Write a decoder that itself contains no embedded-zeros.
    5. Append the encoded result from step 3 to the decoder you wrote in step 4 and combine it with the binary representation you developed in step 2.
    6. Type the resulting string into the program.
    7. Watch your code run.

    The easy way
    1. Write the code that you want to inject. (You can use any language, doesn't have to compile to native code.)
    2. Run it.

    It's like saying that somebody's home windows are insecure because a burglar could get into the house by merely unlocking and opening the windows from the inside. (But if the burglar has to get inside in order to unlock the windows...)

    Code injection doesn't become a security hole until you have elevation of privilege. In other words, if attackers gains the ability to do something they normally wouldn't. If the attack vector requires setting a registry key, then the attacker must already have obtained the ability to run enough code to set a registry key, in which case they can just forget about "unlocking the window from the inside" and just replace the code that sets the registry with the full-on exploit. The alleged attack vector is a red herring. The burglar is already inside the house.

    Or suppose you found a technique to cause an application to log sensitive information, triggered by a setting that only administrators can enable. Therefore, in order to "exploit" this hole, you need to gain administrator privileges, in which case why stop at logging? Since you have administrator privileges, you can just replace the application with a hacked version that does whatever you want.

    Of course, code injection can indeed be a security hole if it permits elevation of privilege. For example, if you can inject code into a program running at a different security level, then you have the opportunity to elevate. This is why extreme care must be taken when writing unix root-setuid programs and Windows services: These programs run with elevated privileges and therefore any code injection bug becomes a fatal security hole.

    A common starting point from which to evaluate elevation of privilege is the Internet hacker. If some hacker on the Internet can inject code onto your computer, then they have successfully elevated their privileges, because that hacker didn't have the ability to execute arbitrary code on your machine prior to the exploit. Next time, we'll look at some perhaps-unexpected places your program can become vulnerable to an Internet attack, even if you think your program isn't network-facing.

  • The Old New Thing

    Every discipline has its crackpots: Stories of mathematics

    • 31 Comments

    I'm sure every discipline has its share of crackpots. I suspect the physicists get it the worst, starting with the old standbys of perpetual motion machines and faster-than-light travel, then tossing in quantum mechanics and nuclear physics, and stirring with large quantities of long rambling text that makes no sense. But my experience is with mathematics.

    When I was in college, the mathematics department would post "interesting letters" onto the bulletin board. The ones I remember:

    • A claimed proof of Fermat's Last Theorem by means of music theory. It started with 3² + 4² = 5², then somehow converted this into musical notation, transposed it into another key, and then concluded that the proof was complete. There were many "proofs" of Fermat's Last Theorem on the bulletin board.
    • A letter from an inmate at a correctional facility who had developed a system for winning the lottery and merely needed a printout of all possible ways of choosing 6 numbers from a pool of 46.

    A number theorist working on factorization explained to me that he would frequently receive "manuscripts" from people who claim to have found a high-speed algorithm for factoring large numbers. At first, he would take the time to study these manuscripts, and each time he would determine that the algorithm boiled down to trial division, often cleverly-disguised trial division, but trial division nevertheless. (Though when this was pointed out, the authors often rejected his analysis.) Eventually, he realized that he could separate the wheat from the chaff very easily by simply replying with the following message:

    Thank you for your fascinating manuscript on the factorization of large numbers. There are some numbers that have been giving me difficulty of late. I would be most appreciative if you could use your technique to factor this one for me.

    Upon which he would include a number whose factorization would take years on generally-available computational hardware at the current state of understanding. He never heard back from these people.

    Another of my professors told a story of one "correspondent" who was convinced that the speed of light could be overcome. (Yes, that's actually a physics question, not a mathematics question, but it was sent to the math department anyway.) The professor started by trying to explain the principles of special relativity to his new pen-pal but quickly realized that wasn't going to lead anywhere. The correspondence was quite pleasant; the other person was a retired gentleman who gardened and enjoyed going for walks when he wasn't working on pushing the envelope of modern physics. At one point, the correspondent wrote back a multi-page letter consisting of crayon drawings that proved that the speed of light could be exceeded. It went something like this:

    The first page consisted of a drawing of the earth with a little rocket ship in orbit around it.

    Consider a rocket ship that circles the earth once a day. This rocket is travelling at 463.831019 meters per second.

    Say what you will, but these people never suffer from the problem of too few significant digits.

    Now imagine that each day, the time it takes to circle the earth SHRINKS IN HALF.
    So that on the SECOND DAY it takes only twelve hours to circle the earth at a speed of 927.662037 METERS PER SECOND.

    The rocket ship on the second day has a few extra zoom-lines on it.

    By the THIRD DAY it takes only SIX HOURS to circle the earth at a speed of 1855.32407 METERS PER SECOND.

    The rocket ship on the third day is going a little faster.

    Each page consisted of a daily status report on our little rocket ship, illustrated in glorious crayon, each drawing more elaborate than the last. As the rocket ship goes faster and faster, the report on its speed gets bigger and bigger. I'll skip ahead a bit.

    On DAY TWENTY
    the rocket ship circles the earth in 164.794922 MILLISECONDS
    at a speed of 243,181,037 METERS PER SECOND.

    Finally, the hammer falls:

    On DAY TWENTY-ONE
    the rocket ship circles the earth
    in only 82.3974609 MILLISECONDS
    at a speed of 486,362,075 METERS PER SECOND.
    IT HAS BROKEN THE LIGHT BARRIER.

    The professor realized the jig was up. He wrote back, "Yes, it looks like you've done it."

  • The Old New Thing

    Raymond makes a psychic prediction for 2006

    • 99 Comments

    I have gazed into my crystal ball and emerged with a prediction for 2006. Revealing my prediction now may influence the event itself, so I will post only the hash for the prediction. I will post the actual prediction at the end of the year.

    using System.Security.Cryptography;
    using System;
    
    class Prediction {
    
     static void DoHash(byte[] bytes, string name, HashAlgorithm hash)
     {
      Console.WriteLine(name);
      hash.ComputeHash(bytes);
      byte[] result = hash.Hash;
      for (int i = 0; i < result.Length; i++) {
       Console.Write("{0:X2}", result[i]);
       if (i % 32 == 31) Console.WriteLine();
      }
      if (result.Length % 32 != 0) Console.WriteLine();
     }
    
     static void Main()
     {
      string msg = "prediction goes here";
      Console.WriteLine("length {0}", msg.Length);
      byte[] bytes = (new System.Text.ASCIIEncoding()).GetBytes(msg);
      DoHash(bytes, "MD5", MD5.Create());
      DoHash(bytes, "SHA1", SHA1.Create());
      DoHash(bytes, "SHA256", SHA256.Create());
      DoHash(bytes, "SHA384", SHA384.Create());
      DoHash(bytes, "SHA512", SHA512.Create());
     }
    }
    

    The output of this program (after you replace "prediction goes here" with the actual prediction, of course) is as follows:

    length 45
    MD5
    6D915EC203DF0C918D13B63C4FF7C1EE
    SHA1
    49A2E2B22D27D450890E30D0A34EBA53B454925E
    SHA256
    2C928DC82E133B0FAD5DAA64BC373BE400C700B124749072816B7053EECC9A82
    SHA384
    080BDBB804B8F9B28731E6E17F872C6BE6F8F08B6670CA3424726295DE58A8DE
    2FE9EA43D724B7AA2ED3366CA9A80631
    SHA512
    D0A84D8B1B330F101D115044C9C7146605C951199BC2F036EE677C690D5151A9
    3F78FDFD8E6FF147EE2DB517A96642B24ED17D2306A772B953281CB4C0BEEDF1
    
  • The Old New Thing

    A cache with a bad policy is another name for a memory leak

    • 37 Comments

    A common performance trick is to reduce time spent in the heap manager by caching the last item freed (or maybe the last few) so that a subsequent allocation can just re-use the item rather than having to go make a new one. But you need to be careful how you do this or you can end up making things worse rather than better. Here's an example motivated by an actual problem the Windows performance team researched.

    Consider a cache of variable-sized buffers. I will use only a one-entry cache for simplicity. In real life, the cache would be more complicated: People tend to have a deeper cache of four to ten entries, and you would have to ensure that only one thread used the cache at a time; typically this is done by associating the cache with something that has thread affinity. Furthermore, you probably would keep the size of the cached buffer in a member variable instead of calling LocalSize all the time. I've left out all these complications to keep the presentation simple.

    class BufferCache {
    public:
     BufferCache() : m_pCache(NULL) { }
     ~BufferCache() { LocalFree(m_pCache); }
    
     void *GetBuffer(SIZE_T cb);
     void ReturnBuffer(void *p);
    
    private:
     void *m_pCache;
    };
    

    If a request for a memory buffer arrives and it can be satisfied from the cache, then the cached buffer is returned. Otherwise, a brand new buffer is allocated.

    void *BufferCache::GetBuffer(SIZE_T cb)
    {
     // Satisfy from cache if possible
     if (m_pCache && LocalSize(m_pCache) >= cb) {
      void *p = m_pCache;
      m_pCache = NULL;
      return p;
     }
     return LocalAlloc(LMEM_FIXED, cb);
    }
    

    When a buffer is returned to the cache, we compare it against the item already in the cache and keep the bigger one, since that is more likely to satisfy a GetBuffer in the future. (In the general case of a multiple-entry cache, we would free the smallest entry.)

    // Flawed design - see discussion
    void BufferCache::ReturnBuffer(void *p)
    {
     SIZE_T cb = LocalSize(p);
     if (!m_pCache || cb > LocalSize(m_pCache)) {
      // Returned buffer is bigger than the cache:
      // Keep the returned buffer
      LocalFree(m_pCache);
      m_pCache = p;
     } else {
      // Returned buffer is smaller than the cache:
      // Keep the cache
      LocalFree(p);
     }
    }
    

    Why is this a flawed design? I'll let you think about this for a while.

    No really, I want you to think about it.

    Are you thinking? Take your time; I'll be here when you're done.

    Okay, since I know you haven't actually thought about it but are just sitting there waiting for me to tell you, I'll give you a bit of a nudge.

    The distribution of buffer sizes is rarely uniform. The most common distribution is that small buffers are popular, with larger and larger buffers being required less and less often. Let's write a sample program that allocates and frees memory according to this pattern. To make the bad behavior easier to spot in a short run, I'm going to use a somewhat flat distribution and say that half of the buffers are small, with larger buffers becoming less popular according to exponential decay. In practice, the decay curve is usually much, much steeper.

    #include <vector>
    #include <iostream>
    
    // Since this is just a quick test, we're going to be sloppy
    using namespace std; //  sloppy
    int __cdecl main(int argc, char **argv)
    {
     BufferCache b;
    
     // seeding the random number generator is not important here
    
     vector<void *> v; // keeps track of allocated memory
     for (;;) {
      // randomly allocate and free
      if (v.size() == 0 || (rand() & 1)) { // allocate
       SIZE_T cb = 100;
       while (cb < 1024 * 1024 && (rand() & 1)) {
        cb *= 2; // exponential decay distribution up to 1MB
       }
       void* p = b.GetBuffer(cb);
       if (p) {
        cout << " A" << LocalSize(p) << "/" << cb;
        v.push_back(p);
       }
      } else { // free
       int victim = rand() % v.size(); // choose one at random
       cout << " F" << LocalSize(v[victim]);
       b.ReturnBuffer(v[victim]); // free it
       v[victim] = v.back();
       v.pop_back();
      }
     }
    }
    

    This short program randomly allocates and frees memory from the buffer cache, printing (rather cryptically) the size of the blocks allocated and freed. When memory is allocated, it prints "A1/2" where "1" is the size of the block actually allocated and "2" is the size requested. When freeing memory, it prints "F3" where "3" is the size of the block allocated. Run this program, let it do its thing for maybe ten, fifteen seconds, then pause the output and study it. I'll wait. If you're too lazy to actually compile and run the program, I've included some sample output for you to study:

    F102400 A102400/400 F800 F200 A800/100 A200/200 A400/400
    A400/400 A200/200 F1600 A1600/100 F100 F800 F25600 A25600/200
    F12800 A12800/200 F200 F400 A400/100 F200 A200/100 A200/200
    A100/100 F200 F3200 A3200/400 A200/200 F51200 F800 F25600
    F1600 F1600 A51200/100 F100 A100/100 F3200 F200 F409600 F100
    A409600/400 A100/100 F200 F3200 A3200/800 A400/400 F800 F3200
    F200 F12800 A12800/200 A100/100 F200 F25600 F400 F6400
    A25600/100 F100 F200 F400 F200 F800 F400 A800/800 A100/100
    

    Still waiting.

    Okay, maybe you don't see it. Let's make the effect even more obvious by printing some statistics periodically. Of course, to generate the statistics, we need to keep track of them, so we'll have to remember how big the requested buffer was (which we'll do in the buffer itself):

    int __cdecl main(int argc, char **argv)
    {
     BufferCache b;
    
     // seeding the random number generator is not important here
    
     vector<void *> v; // keeps track of allocated memory
     SIZE_T cbAlloc = 0, cbNeeded = 0;
     for (int count = 0; ; count++) {
      // randomly allocate and free
      if (v.size() == 0 || (rand() & 1)) { // allocate
       SIZE_T cb = 100;
       while (cb < 1024 * 1024 && !(rand() % 4)) {
        cb *= 2; // exponential decay distribution up to 1MB
       }
       void* p = b.GetBuffer(cb);
       if (p) {
        *(SIZE_T*)p = cb;
        cbAlloc += LocalSize(p);
        cbNeeded += cb;
        v.push_back(p);
       }
      } else { // free
       int victim = rand() % v.size(); // choose one at random
       cbAlloc -= LocalSize(v[victim]);
       cbNeeded -= *(SIZE_T*)v[victim];
       b.ReturnBuffer(v[victim]); // free it
       v[victim] = v.back();
       v.pop_back();
      }
      if (count % 100 == 0) {
       cout << count << ": " << v.size() << " buffers, "
            << cbNeeded << "/" << cbAlloc << "="
            << cbNeeded * 100.0 / cbAlloc << "% used" << endl;
      }
     }
    }
    

    This new version keeps track of how many bytes were allocated as opposed to how many were actually needed, and prints a summary of those statistics every hundred allocations. Since I know you aren't actually going to run it yourself, I've run it for you. Here is some sample output:

    0: 1 buffers, 400/400=100% used
    100: 7 buffers, 4300/106600=4.03377% used
    200: 5 buffers, 1800/103800=1.7341% used
    300: 19 buffers, 9800/115800=8.46287% used
    400: 13 buffers, 5100/114000=4.47368% used
    500: 7 buffers, 2500/28100=8.8968% used
    ...
    37200: 65 buffers, 129000/2097100=6.15135% used
    37300: 55 buffers, 18100/2031400=0.891011% used
    37400: 35 buffers, 10400/2015800=0.515924% used
    37500: 43 buffers, 10700/1869100=0.572468% used
    37600: 49 buffers, 17200/1874000=0.917823% used
    37700: 75 buffers, 26000/1889900=1.37573% used
    37800: 89 buffers, 30300/1903100=1.59214% used
    37900: 91 buffers, 29600/1911900=1.5482% used
    

    By this point, the problem should be obvious: We're wasting insane quantities of memory. For example, after step 37900, we've allocated 1.8MB of memory when we needed only 30KB, for a waste of over 98%.

    How did we go horribly wrong?

    Recall that most of the time, the buffer being allocated is a small buffer, and most of the time, a small buffer is freed. But it's the rare case of a large buffer that messes up everything. The first time a large buffer is requested, it can't come from the cache, since the cache has only small buffers, so it must be allocated. And when it is returned, it is kept, since the cache keeps the largest buffer.

    The next allocation comes in, and it's probably one of the common-case small buffers, and it is given the cached buffer—which is big. You're wasting a big buffer on something that needs only 100 bytes. Some time later, another rare big buffer request comes in, and since that other big buffer got wasted on a small allocation, you have to allocate a new big buffer. You allocated two big buffers even though you need only one. Since big buffers are rare, it is unlikely that a big buffer will be given to a caller that actually needs a big buffer; it is much more likely to be given to a caller that needs a small buffer.

    Bad effect 1: Big buffers get wasted on small callers.

    Notice that once a big buffer enters the system, it is hard to get rid of, since a returned big buffer will be compared against what is likely to be a small buffer, and the small buffer will lose.

    Bad effect 2: Big buffers rarely go away.

    The only way a big buffer can get freed is if the buffer in the cache is itself already a big buffer. If instead of a one-entry cache like we have here, you keep, say, ten buffers in your buffer cache, then in order to free a big buffer, you have to have eleven consecutive ReturnBuffer calls, all of which pass a big buffer.

    Bad effect 3: The more efficient you try to make your cache, the more wasteful it gets!

    What's more, when that eleventh call to ReturnBuffer is made with a big buffer, it is only the smallest of the big buffers that gets freed. The biggest buffers stay.

    Bad effect 4: When a big buffer does go away, it's only because you are keeping an even bigger buffer!
    Corollary: The biggest buffer never gets freed.

    What started out as an "obvious" decision in choosing which buffer to keep has turned into a performance disaster. By favoring big buffers, you allowed them to "poison" the cache, and the longer you let the system run, the more allocations end up being big "poisoned" buffers. It doesn't matter how rare those big blocks are; you will eventually end up in this state. It's just a matter of time.

    When the performance team tries to explain this problem to people, many of them get the mistaken impression that the problem is merely that there is wasted space in the cache. But look at our example: Our cache has only one entry and we are still wasting over 90% of the memory. That's because the waste is not in the memory being held by the cache, but rather is in the memory that the cache hands out. (It's sort of like that scene in It's a Wonderful Life where George Bailey is explaining where all the money is. It's not in the bank; it's in all the places that got money from the bank.)

    My recommendations:

    • Instrument your cache and understand what your program's memory allocation patterns are.
    • Use that information to pick a size cutoff point beyond which you simply will not use the cache at all. This ensures that big buffers never get into the cache in the first place. Choosing this cutoff point is usually extremely easy once you look at then allocation histogram.
    • Although you've taken the big buffers out of the picture, you will still have the problem that the small buffers will gradually grow up to your cutoff size. (I.e., you still have the same problem, just in miniature.) Therefore, if the cache is full, you should just free the most recently returned buffer regardless of its size.
    • Do not use the cached buffer if the waste is too great. You might decide to use multiple "buckets" of cached entries, say one for buffers below 100 bytes, another for buffers between 100 and 200 bytes, and so on. That way, the waste per allocation is never more than 100 bytes.
    • Finally, reinstrument your cache to ensure that you're not suffering from yet some other pathological behavior that I haven't taken into account.

    Here's a new ReturnBuffer implementation that takes some of the above advice into account. Instrumentation shows that three quarters of the allocations are in the 100–200 byte range, so let's cap our cache at 200 bytes.

    void BufferCache::ReturnBuffer(void *p)
    {
     if (m_pCache == NULL && LocalSize(p) <= 200) {
      m_pCache = p;
     } else {
      LocalFree(p);
     }
    }
    

    With this one seemingly-minor change, our efficiency stays above 90% and occasionally even gets close to 100%:

    0: 1 buffers, 400/400=100% used
    100: 7 buffers, 4300/4400=97.7273% used
    200: 5 buffers, 1800/1800=100% used
    300: 19 buffers, 9800/9800=100% used
    400: 13 buffers, 5100/5100=100% used
    500: 7 buffers, 2500/2600=96.1538% used
    ...
    37200: 65 buffers, 129000/130100=99.1545% used
    37300: 55 buffers, 18100/18700=96.7914% used
    37400: 35 buffers, 10400/11000=94.5455% used
    37500: 43 buffers, 10700/11000=97.2727% used
    37600: 49 buffers, 17200/18000=95.5556% used
    37700: 75 buffers, 26000/26800=97.0149% used
    37800: 89 buffers, 30300/31900=94.9843% used
    37900: 91 buffers, 29600/30600=96.732% used
    

    Don't forget to check out performance guru Rico Mariani's reminder that Caching implies Policy. As he explained to me, "Cache policy is everything so you must be dead certain that your policy is working as you intended. A cache with a bad policy is another name for a memory leak."

  • The Old New Thing

    What can I do with the HINSTANCE returned by the ShellExecute function?

    • 17 Comments

    As we saw earlier, in 16-bit Windows, the HINSTANCE identified a program. The Win32 kernel is a complete redesign from the 16-bit kernel, introducing such concepts as "kernel objects" and "security descriptors". In particular 16-bit Windows didn't have "process IDs"; the instance handle served that purpose. That is why the WinExec and ShellExecute functions returned an HINSTANCE. But in the 32-bit world, HINSTANCEs do not uniquely identify a running program since it is merely the base address of the executable. Since each program runs in its own address space, that value is hardly unique across the entire system.

    So what can you do with the HINSTANCE returned by the ShellExecute function? You can check if it greater than 32, indicating that the call was successful. If the value is less than 32, then it is an error code. The precise value of the HINSTANCE in the greater-than-32 case is meaningless.

    Why am I bothering to tell you things that are already covered in MSDN? Because people still have trouble putting two and two together. I keep seeing people who take the HINSTANCE returned by the ShellExecute function and hunt through all the windows in the system looking for a window with a matching GWLP_HINSTANCE (or GWL_HINSTANCE if you're still living in the unenlightened non-64-bit-compatible world). This doesn't work for the two reasons I described above. First, the precise value of the HINSTANCE you get back is meaningless, and even if it were meaningful, it wouldn't do you any good since the HINSTANCE is not unique. (In fact, the HINSTANCE for a process is nearly always 0x00400000, since that is the default address most linkers assign to program executables.)

    The most common reason people want to pull this sort of trick in the first place is that they want to do something with the program that was just launched, typically, wait for it to exit, indicating that the user has closed the document. Unfortunately, this plan comes with its own pitfalls.

    First, as we noted, the HINSTANCE that you get from the ShellExecute function is useless. You have to use the ShellExecuteEx function and set the SEE_MASK_NOCLOSEPROCESS flag in the SHELLEXECUTEINFO structure, at which point a handle to process is returned in the hProcess member. But that still doesn't work.

    A document can be executed with no new process being created. The most common case (but hardly the only such) in which you will encounter this is if the registered handler for the document type requested a DDE conversation. In that case, an existing instance of the program has accepted responsibility for the document. Waiting for the process to exit is not the same as waiting for the user to close the document, because closing the document doesn't exit the process.

    Just because the user closes the document doesn't mean that the process exits. Most programs will let you open a new document from the "File" menu. Once that new document is opened, the user can close the old one. (Single-document programs implicitly close the old document when the new one is opened.) What's more, closing all open windows associated with the document need not result in the program exiting. Some programs run in the background even after you've closed all their windows, either to provide some sort of continuing service, or just because they are just anticipating that the user will run the program again soon so they delay the final exit for a few minutes to see if they will be needed.

    Just because the process exits doesn't mean that the document is closed. Some programs detect a previous instance and hand off the document to that instance. Other programs are stubs that launch another process to do the real work. In either case, the newly-created process exits quickly, but the document is still open, since the responsibility for the document has been handed off to another process.

    There is no uniform way to detect that a document has been closed. Each program handles it differently. If you're lucky, the program exposes properties that allow you to monitor the status of an open document. As we saw earlier, Internet Explorer exposes properties of its open windows through the ShellWindows object. I understand that Microsoft Office also exposes a rather elaborate set of automation interfaces for its component programs.

  • The Old New Thing

    Beware of digits before the redirection operator

    • 29 Comments

    If you want to put the string "Meet at 2" into the file "schedule", you might be tempted to use

    echo Meet at 2>schedule
    

    If you try this, however, you'll see the string "Meet at" on the screen and the "schedule" file will be blank. [Typo fixed, 10am]

    What happened?

    A digit immediately before a redirection operator modifies which stream the redirection operator applies to. If you're going to redirected an alternate output stream, it'll nearly always be the standard error stream, or stream 2. To put the error output into a file, you would write something like this:

    sort /invalidswitch 2>errorfile
    

    There is also the operator ">&" that reopens a stream as another stream. The idiom

    some-command >output 2>&1
    

    says, "Put the normal output into the file output, and then change the error output stream (2) to refer to the normal output stream (1)." The result is that both the regular output and error output end up in the output file.

    But what if you really want to put the string "Meet at 2" into the file "schedule"?

    You can insert a space between the "2" and the ">". This works for most programs since they ignore trailing spaces on their command line, but this was a trick question: The echo command is one of the few commands that actually pays attention to trailing spaces. As a result, the contents of the "schedule" file is "Meet at 2<space><cr><lf>". Maybe this is close enough for you, in which case you can skip the next paragraph.

    But what if you don't want that trailing space? For that, you can use the metacharacter escape character, the ^:

    echo Meet at ^2>schedule
    

    The last gotcha is that the pesky "2" might come from environment variable expansion.

    set message=Meet at 2
    echo %message%>schedule
    

    The trailing "2" in %message% interacts with the greater-than sign, leading to an unintended redirection. For this, you can insert a space before the greater-than sign, assuming you are in a scenario where that space is not going to cause you any problems. (And if you're in a scenario where that space will cause a problem, you can use a trick we'll look at next time.)

    Mind you, if you're going to take an environment variable whose contents you do not control and expand it onto your command line unquoted, you have much worse problems than a trailing digit messing up your file redirection. Somebody might have decided that the message should be "&format C: /y". Inserting this into the command line unquoted would yield "echo &format C: /y>schedule" which is a pretty good way to ruin somebody's day. (Well, okay, you can't format a drive with an active pagefile, but you get the idea.)

Page 1 of 4 (35 items) 1234