May, 2006

  • The Old New Thing

    Raymond makes a psychic prediction for 2006

    • 99 Comments

    I have gazed into my crystal ball and emerged with a prediction for 2006. Revealing my prediction now may influence the event itself, so I will post only the hash for the prediction. I will post the actual prediction at the end of the year.

    using System.Security.Cryptography;
    using System;
    
    class Prediction {
    
     static void DoHash(byte[] bytes, string name, HashAlgorithm hash)
     {
      Console.WriteLine(name);
      hash.ComputeHash(bytes);
      byte[] result = hash.Hash;
      for (int i = 0; i < result.Length; i++) {
       Console.Write("{0:X2}", result[i]);
       if (i % 32 == 31) Console.WriteLine();
      }
      if (result.Length % 32 != 0) Console.WriteLine();
     }
    
     static void Main()
     {
      string msg = "prediction goes here";
      Console.WriteLine("length {0}", msg.Length);
      byte[] bytes = (new System.Text.ASCIIEncoding()).GetBytes(msg);
      DoHash(bytes, "MD5", MD5.Create());
      DoHash(bytes, "SHA1", SHA1.Create());
      DoHash(bytes, "SHA256", SHA256.Create());
      DoHash(bytes, "SHA384", SHA384.Create());
      DoHash(bytes, "SHA512", SHA512.Create());
     }
    }
    

    The output of this program (after you replace "prediction goes here" with the actual prediction, of course) is as follows:

    length 45
    MD5
    6D915EC203DF0C918D13B63C4FF7C1EE
    SHA1
    49A2E2B22D27D450890E30D0A34EBA53B454925E
    SHA256
    2C928DC82E133B0FAD5DAA64BC373BE400C700B124749072816B7053EECC9A82
    SHA384
    080BDBB804B8F9B28731E6E17F872C6BE6F8F08B6670CA3424726295DE58A8DE
    2FE9EA43D724B7AA2ED3366CA9A80631
    SHA512
    D0A84D8B1B330F101D115044C9C7146605C951199BC2F036EE677C690D5151A9
    3F78FDFD8E6FF147EE2DB517A96642B24ED17D2306A772B953281CB4C0BEEDF1
    
  • The Old New Thing

    Doing quick arithmetic from the command prompt

    • 64 Comments

    The command processor CMD.EXE comes with a mini-calculator that can perform simple arithmetic on 32-bit signed integers:

    C:\>set /a 2+2
    4
    C:\>set /a 2*(9/2)
    8
    C:\>set /a (2*9)/2
    9
    C:\>set /a "31>>2"
    7
    

    Note that we had to quote the shift operator since it would otherwise be misinterpreted as a "redirect stdout and append" operator.

    For more information, type set /? at the command prompt.

  • The Old New Thing

    Assaulting users with dialog box after dialog box

    • 61 Comments

    Increasingly, I'm seeing solving problems by adding more dialog boxes. Asking the user too much is as bad as not asking enough.

    "You clicked on the Notepad icon. Do you wish to run Notepad?"

    Okay, nobody would write a dialog box that stupid, would they? But the following dialog boxes don't really help much either:

    "You clicked on an mp3 file. Do you want to open it with that program you just installed?"

    "You clicked on an avi file. Do you want to open it with that program you just installed?"

    "You clicked on an mpg file. Do you want to open it with that program you just installed?"

    "You clicked on a wmv file. Do you want to open it with that program you just installed?"

    "You clicked on a wma file. Do you want to open it with that program you just installed?"

    A phenomenon known as "Dialog box fatigue" quickly sets in (with a big dose of "You stupid computer" thrown in for good measure). The system today is already filled with so many dialog boxes that users are conditioned just to click "Yes" to everything. If you click "Yes", then the computer does the thing you told it do. If you click "No", then nothing happens. In other words, users have already learned to interpret these dialog boxes as saying "Click Yes to do the thing you want to do or No to give up and cancel." Why would anybody click "No"?

    Moving to the specific case of programs that display media content: I am led to believe that all the media players include a page in their install wizard that list the file types they are capable of playing and give you a chance to check/uncheck the ones you want them to assume responsibility for. I believe this was the armistice that resulted after a bunch of industry wrangling wherein companies were busy accusing each other of sabotaging each other's file types. The upshot is that all reputable media players have already gotten your permission to take over playback of those media types. Why ask the user to confirm something they have already confirmed? "Stupid computer."

    Okay, so those dialog boxes are unnecessary for reputable media players. What about the disreputable ones? Well, once you've established that you're dealing with disreputable media players, they're just going to go in and hack the settings to say, "Oh, the user already confirmed this take-over, don't bother asking. It's all good."

    So you now have a dialog box that (1) is redundant with reputable media players, and (2) has no effect for disreputable media players. What did this dialog box accomplish again?

  • The Old New Thing

    How do I write a regular expression that matches an IPv4 dotted address?

    • 54 Comments

    Writing a regular expression that matches an IPv4 dotted address is either easy or hard, depending on how good a job you want to do. In fact, to make things easier, let's match only the decimal dotted notation, leaving out the hexadecimal variant, as well as the non-dotted variants.

    For the purpose of this discussion, I'll restrict myself to the common subset of the regular expression languages shared by perl, JScript, and the .NET Framework, and I'll assume ECMA mode, wherein \d matches only the characters 0 through 9. (By default, in the .NET Framework, \d matches any decimal digit, not just 0 through 9.)

    The easiest version is just to take any string of four decimal numbers separated by periods.

    /^\d+\.\d+\.\d+\.\d+$/
    

    This is nice as far as it goes, but it erroneously accepts strings like "448.90210.0.65535". A proper decimal dotted address has no value larger than 255. But writing a regular expression that matches the integers 0 through 255 is hard work because regular expressions don't understand arithmetic; they operate purely textually. Therefore, you have to describe the integers 0 through 255 in purely textual means.

    • Any single digit is valid (representing 0 through 9).
    • Any nonzero digit followed by another digit is valid (representing 10 through 99).
    • A "1" followed by two digits is valid (100 through 199).
    • A "2" followed by "0" through "4" followed by another digit is valid (200 through 249).
    • A "25" followed by "0" through "5" is valid (250 throuth 255).

    Given this textual breakdown of the integers 0 through 255, your first try would be something like this:

    /^\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]$/
    

    This can be shrunk a bit by recognizing that the first two rules above could be combined into

    • Any digit, optionally preceded by a nonzero digit, is valid.

    yielding

    /^[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]$/
    

    Now we just have to do this four times with periods in between:

    /^([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$/
    

    Congratulations, we have just taken a simple description of the dotted decimal notation in words and converted into a monstrous regular expression that is basically unreadable. Imagine you were maintaining a program and stumbled across this regular expression. How long would it take you to figure out what it did?

    Oh, and it might not be right yet, because some parsers accept leading zeroes in front of each decimal value without affecting it. (For example, 127.0.0.001 is the same as 127.0.0.1. On the other hand, some parsers treat a leading zero as an octal prefix.) Updating our regular expression to accept leading decimal zeroes means that we now have

    /^0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$/
    

    This is why I both love and hate regular expressions. They are a great way to express simple patterns. And they are a horrific way to express complicated ones. Regular expressions are probably the world's most popular write-only language.

    Aha, but you see, all this time diving into regular expressions was a mistake. Because we failed to figure out what the actual problem was. This was a case of somebody "solving" half of their problem and then asking for help with the other half: "I have a string and I want to check whether it is a dotted decimal IPv4 address. I know, I'll write a regular expression! Hey, can anybody help me write this regular expression?"

    The real problem was not "How do I write a regular expression to recognize a dotted decimal IPv4 address." The real problem was simply "How do I recognize a dotted decimal IPv4 address." And with this broader goal in mind, you recognize that limiting yourself to a regular expression only made the problem harder.

    function isDottedIPv4(s)
    {
     var match = s.match(/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/);
     return match != null &&
            match[1] <= 255 && match[2] <= 255 &&
            match[3] <= 255 && match[4] <= 255;
    }
    
    WScript.StdOut.WriteLine(isDottedIPv4("127.0.0.001"));
    WScript.StdOut.WriteLine(isDottedIPv4("448.90210.0.65535"));
    WScript.StdOut.WriteLine(isDottedIPv4("microsoft.com"));
    

    And this was just a simple dotted decimal IPv4 address. Woe unto you if you decide you want to parse e-mail addresses.

    Don't make regular expressions do what they're not good at. If you want to match a simple pattern, then match a simple pattern. If you want to do math, then do math. As commenter Maurits put it, "The trick is not to spend time developing a combination hammer/screwdriver, but just use a hammer and a screwdriver.

  • The Old New Thing

    Why doesn't Ethan Hunt have to wear identification?

    • 51 Comments

    Whenever there was a scene in Mission: Impossible III that took place at the agency offices, I was repeatedly bothered by the fact that all the people in the building are wearing their identification badges clipped to their jackets or shirts. Except Ethan Hunt. He gets to walk through the halls like a cologne advertisement.

    Why doesn't he have to wear identification? His boss has to wear identification. His boss's boss has to wear identification. But Ethan Hunt gets to just wander around in black looking cool without any unsightly identification tag that would ruin the look of whole outfit.

    I was also somewhat off-put, as was Bob Mondello, that the producers thought it necessary to identify cities on-screen as "Berlin, Germany", "Rome, Italy", and "Shanghai, China". Do they think we're so stupid that we don't know where Berlin is?

    (And keep an eye out for the American-style fire alarm during the chase through Shanghai just as Ethan Hunt turns a corner. At first glance I thought it said "REEB", but upon further reflection I believe the last two letters are more likely to be "FD"—"fire department". I don't know what the first two letters stand for, or even if I remembered them correctly.)

  • The Old New Thing

    Solutions that don't actually solve anything

    • 48 Comments

    If changing a setting requires administrator privileges in the first place, then any behavior that results cannot be considered a security hole because in order to alter the setting, attackers must already have gained administrative privileges on the machine, at which point you've already lost the game. If attackers have administrative privileges, they're not going to waste his time fiddling with some setting and leveraging it to gain even more privileges on the system. They're already the administrator; why go to more work to get what they already have?

    One reaction to this is to try to "secure" the feature by asking, "Well, can we make it harder to change that setting?" For example, in response to the Image File Execution Options key, Norman Diamond suggested "only allowing the launching of known debuggers." But this solution doesn't actually solve anything. What would a "known debugger" be?

    • "The operating system contain a hard-coded list of known debuggers. On that list are ntsd.exe, cdb.exe, and maybe windbg.exe." Personally, I would be okay with that, but that's because I do all my debugging in assembly language anyway. Most developers would want to use devenv.exe or bds.exe or even gdb.exe. If somebody comes up with a new debugger, they would have to petition Microsoft to add it to the hard-coded list of "known debuggers" and then wait for the next service pack for it to get broad distribution. And even before the ink was dry on the electrons, I'm sure somebody somewhere will already have filed an anti-competitive-behavior lawsuit. ("Microsoft is unlawfully raising the barrier to entry to competing debugging products!")
    • "Okay, then the program just needs to be digitally signed in order to be considered a 'known debugger'." Some people would balk at the $500/year cost of a code signing certificate. And should the operating system ask the user whether or not they trust the signing authority before running the debugger? (What if the debugger is being invoked on a service or a remote computer? There is nowhere to display the UI!) Actually, these were all trick questions. It doesn't matter whether the operating system prompts or not, because the attackers would just mark their signing certificate as a "trusted" certificate. And in fact the $500/year wouldn't stop the attackers, since they would just create their own certificate and install it as a "trusted root". Congratulations, the only people who have to pay the $500/year are the honest ones. The bad guys just slip past with their self-signed trusted-root certificate.
    • "Okay, forget the digital signature thing, just have a registry key that lists all the 'known debuggers'. If you're on the list, then you can be used in Image File Execution Options." Well, in that case, the attackers would just update the registry key directly and set themselves as a "known debugger". That "known debuggers" registry key didn't slow them done one second.
    • "Okay, then not a registry key, but some other setting that's hard to find." Oh, now you're advocating security through obscurity?

    Besides, it doesn't matter how much you do to make the Image File Execution Options key resistant to unwanted tampering. If the attacker has administrative privileges on your machine, they won't bother with Image File Execution Options anyway. They'll just install a rootkit and celebrate the addition of another machine to their robot army.

    Thus is the futility of trying to stop someone who already has obtained administrative privileges. You're just closing the barn door after the horse has bolted.

  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway

    • 47 Comments

    Not every code injection bug is a security hole.

    Yes, a code injection bug is a serious one indeed. But it doesn't become a security hole until it actually allows someone to do something they normally wouldn't be able to.

    For example, suppose there's a bug where if you type a really long file name into a particular edit control and click "Save", the program overflows a buffer. With enough work, you might be able to turn this into a code injection bug, by entering a carefully-crafted file name. But that's not a security hole yet. All you've found so far is a serious bug. (Yes, it's odd that I'm underplaying a serious bug, but only because I'm comparing it to a security hole.)

    Look at what you were able to do: You were able to get a program to execute code of your choosing. Big deal. You can already do that without having to go through all this effort. If you wanted to execute code of your own choosing, then you can just put it in a program and run it!

    The hard way

    1. Write the code that you want to inject, compile it to native machine code.
    2. Analyze the failure, develop a special string whose binary representation results in the overwriting of a return address, choosing the value so that it points back into the stack.
    3. Write an encoder that takes the code you wrote in step 1 and converts it into a string with no embedded zeros. (Because an embedded zero will be treated as a string terminator.)
    4. Write a decoder that itself contains no embedded-zeros.
    5. Append the encoded result from step 3 to the decoder you wrote in step 4 and combine it with the binary representation you developed in step 2.
    6. Type the resulting string into the program.
    7. Watch your code run.

    The easy way
    1. Write the code that you want to inject. (You can use any language, doesn't have to compile to native code.)
    2. Run it.

    It's like saying that somebody's home windows are insecure because a burglar could get into the house by merely unlocking and opening the windows from the inside. (But if the burglar has to get inside in order to unlock the windows...)

    Code injection doesn't become a security hole until you have elevation of privilege. In other words, if attackers gains the ability to do something they normally wouldn't. If the attack vector requires setting a registry key, then the attacker must already have obtained the ability to run enough code to set a registry key, in which case they can just forget about "unlocking the window from the inside" and just replace the code that sets the registry with the full-on exploit. The alleged attack vector is a red herring. The burglar is already inside the house.

    Or suppose you found a technique to cause an application to log sensitive information, triggered by a setting that only administrators can enable. Therefore, in order to "exploit" this hole, you need to gain administrator privileges, in which case why stop at logging? Since you have administrator privileges, you can just replace the application with a hacked version that does whatever you want.

    Of course, code injection can indeed be a security hole if it permits elevation of privilege. For example, if you can inject code into a program running at a different security level, then you have the opportunity to elevate. This is why extreme care must be taken when writing unix root-setuid programs and Windows services: These programs run with elevated privileges and therefore any code injection bug becomes a fatal security hole.

    A common starting point from which to evaluate elevation of privilege is the Internet hacker. If some hacker on the Internet can inject code onto your computer, then they have successfully elevated their privileges, because that hacker didn't have the ability to execute arbitrary code on your machine prior to the exploit. Next time, we'll look at some perhaps-unexpected places your program can become vulnerable to an Internet attack, even if you think your program isn't network-facing.

  • The Old New Thing

    Beware the C++ implicit conversion

    • 47 Comments

    Today's topic was inspired by a question from a customer:

    I am working on a stack overflow bug. To reduce the size of the stack frame, I removed as many local variables as I could, but there's still a a lot of stack space that I can't account for. What else lives on the stack aside from local variables, parameters, saved registers, and the return address?

    Well, there's also structured exception handling information, but that's typically not too much and therefore wouldn't be the source of "a lot" of mysterious stack usage.

    My guess is that the code is generating lots of large C++ temporaries. Consider the following program fragment:

    class BigBuffer
    {
    public:
     BigBuffer(int initialValue)
       { memset(buffer, initialValue, sizeof(buffer)); }
    private:
     char buffer[65536];
    };
    
    extern void Foo(const BigBuffer& o);
    
    void oops()
    {
     Foo(3);
    }
    

    "How does this code even compile? The function Foo wants a BigBuffer, not an integer!" Yet compile it does.

    That's because the compiler is using the BigBuffer constructor as a converter. In other words, the compiler inserted the following temporary variable:

    void oops()
    {
     BigBuffer temp(3);
     Foo(temp);
    }
    

    It did this because a constructor that takes exactly one argument serves two purposes: It can be used as a traditional constructor (as we saw with BigBuffer temp(3)) or it can be used to provide an implicit conversion from the argument type to the constructed type. In this case, the BigBuffer(int) constructor is being used as a conversion from int to BigBuffer.

    To prevent this from happening, use the explicit keyword:

    class BigBuffer
    {
    public:
     explicit BigBuffer(int initialValue)
       { memset(buffer, initialValue, sizeof(buffer)); }
    private:
     char buffer[65536];
    };
    

    With this change, the call to Foo(3) raises a compiler error:

    sample.cpp: error C2664: 'Foo' : cannot convert parameter 1 from
         'int' to 'const BigBuffer &'
         Reason: cannot convert from 'int' to 'const BigBuffer'
         Constructor for class 'BigBuffer' is declared 'explicit'
    
  • The Old New Thing

    On languages and spelling

    • 43 Comments

    When I brought up the topic of spelling bees earlier this year, it triggered several comments on how various languages deal with the issue of spelling. Here are some thoughts on the topics that were brought up:

    German spelling is only partly phonetic. Given the spelling of a word, one can, after applying a rather large set of rules, determine its pronunciation with very high accuracy. On the other hand, given the pronunciation of a word, the spelling is not obvious. For example, do you write "Feber" or "Vehber" or possibly "Phäber"? "Ist" or "isst"? "Quelle" or "Kwälle"? The fact that Germany is undergoing controversial spelling reform proves that German spelling is not entirely predictable. After all, if spelling were completely phonetic, there would be no need for reform!

    And all those pronunciation rules. Sometimes a "d" is pronounced like "t"; sometimes a "t" is pronounced like "z"; sometimes a "g" is pronounced like "ch"; sometimes "st" is pronounced like "scht". One would think that a truly "phonetically-spelled" language would have a one-to-one correspondence between sounds and letters. (I'm led to believe that many Eastern European languages are phonetic in this way.) Furthermore, given a word's spelling, it's not always obvious where the stress lies. For example, you just have to know that the accent in Krawatte goes on the second syllable. The spelling gives you no help.

    Swedish is like German in this respect: Given the spelling of a word, you can (again, after the application of a rather large set of rules) determine its pronunciation with a high degree of confidence. But going in the other direction can be a nightmare. The tricky "sj" sound goes by many spellings: "sj", "stj", "stj" , "sk", "ch", and sometimes even "g" (in French-derived words). Depending on the regional accent, the pronunciation of a leading "s" can vary depending on the ending of the previous word. (Though I suspect most Swedes don't even hear the difference themselves.)

    At least in English, we're honest about the fact that our spelling is complicated. English spelling only starts to become intuitive once you've learned French, German, Middle English, Greek, Latin, and a handful of other languages, learned British history (so you know who conquered whom when and ransacked their language for new words), and learned how the precursor-languages to modern English were pronounced at the time the words were imported.

    That last point is a problem common to many languages. The spelling of a word tends to change much more slowly than its pronunciation. English retains the original spelling long after the pronunciation has moved on. Many Chinese characters are puzzling until you realize that the word was pronounced differently a few thousand years ago. (Yes, there is a phonetic component to Chinese characters, believe it or not.) Resistance to spelling reform in Germany is just another manifestation of spelling inertia.

    One thing I thought was interesting was the types of competitions different languages use to promote correct spelling and/or grammar. In the United States, spelling competitions (known as "spelling bees") are the most common way of accomplishing this. Students are each given a word to spell, which must be done from memory. Spell it correctly and you survive to the next round; spell it incorrectly and you are eliminated.

    It is my understanding that in Taiwan, the analogous competition is the "dictionary look-up". I'm hazy on the details, but I think the way it works is that a character is shown to the class, and the students race to look it up in the dictionary. Since dictionaries are typically arrange phonetically, a student who already knows how the character is pronounced has an advantage over a student who has to count strokes and perform radical decomposition in order to look it up.

    I was not previous aware of dictation competitions, but they appear to be particularly popular in Poland. This allows greater emphasis to be placed on the complexity of Polish grammar. A former colleague of mine who grew up in Poland told me that when she goes back to visit relatives, it takes her a while to "regain her tongue" and stop making grammatical errors. You know you've got a complicated language when even a native speaker has to get back up to speed.

  • The Old New Thing

    A cache with a bad policy is another name for a memory leak

    • 37 Comments

    A common performance trick is to reduce time spent in the heap manager by caching the last item freed (or maybe the last few) so that a subsequent allocation can just re-use the item rather than having to go make a new one. But you need to be careful how you do this or you can end up making things worse rather than better. Here's an example motivated by an actual problem the Windows performance team researched.

    Consider a cache of variable-sized buffers. I will use only a one-entry cache for simplicity. In real life, the cache would be more complicated: People tend to have a deeper cache of four to ten entries, and you would have to ensure that only one thread used the cache at a time; typically this is done by associating the cache with something that has thread affinity. Furthermore, you probably would keep the size of the cached buffer in a member variable instead of calling LocalSize all the time. I've left out all these complications to keep the presentation simple.

    class BufferCache {
    public:
     BufferCache() : m_pCache(NULL) { }
     ~BufferCache() { LocalFree(m_pCache); }
    
     void *GetBuffer(SIZE_T cb);
     void ReturnBuffer(void *p);
    
    private:
     void *m_pCache;
    };
    

    If a request for a memory buffer arrives and it can be satisfied from the cache, then the cached buffer is returned. Otherwise, a brand new buffer is allocated.

    void *BufferCache::GetBuffer(SIZE_T cb)
    {
     // Satisfy from cache if possible
     if (m_pCache && LocalSize(m_pCache) >= cb) {
      void *p = m_pCache;
      m_pCache = NULL;
      return p;
     }
     return LocalAlloc(LMEM_FIXED, cb);
    }
    

    When a buffer is returned to the cache, we compare it against the item already in the cache and keep the bigger one, since that is more likely to satisfy a GetBuffer in the future. (In the general case of a multiple-entry cache, we would free the smallest entry.)

    // Flawed design - see discussion
    void BufferCache::ReturnBuffer(void *p)
    {
     SIZE_T cb = LocalSize(p);
     if (!m_pCache || cb > LocalSize(m_pCache)) {
      // Returned buffer is bigger than the cache:
      // Keep the returned buffer
      LocalFree(m_pCache);
      m_pCache = p;
     } else {
      // Returned buffer is smaller than the cache:
      // Keep the cache
      LocalFree(p);
     }
    }
    

    Why is this a flawed design? I'll let you think about this for a while.

    No really, I want you to think about it.

    Are you thinking? Take your time; I'll be here when you're done.

    Okay, since I know you haven't actually thought about it but are just sitting there waiting for me to tell you, I'll give you a bit of a nudge.

    The distribution of buffer sizes is rarely uniform. The most common distribution is that small buffers are popular, with larger and larger buffers being required less and less often. Let's write a sample program that allocates and frees memory according to this pattern. To make the bad behavior easier to spot in a short run, I'm going to use a somewhat flat distribution and say that half of the buffers are small, with larger buffers becoming less popular according to exponential decay. In practice, the decay curve is usually much, much steeper.

    #include <vector>
    #include <iostream>
    
    // Since this is just a quick test, we're going to be sloppy
    using namespace std; //  sloppy
    int __cdecl main(int argc, char **argv)
    {
     BufferCache b;
    
     // seeding the random number generator is not important here
    
     vector<void *> v; // keeps track of allocated memory
     for (;;) {
      // randomly allocate and free
      if (v.size() == 0 || (rand() & 1)) { // allocate
       SIZE_T cb = 100;
       while (cb < 1024 * 1024 && (rand() & 1)) {
        cb *= 2; // exponential decay distribution up to 1MB
       }
       void* p = b.GetBuffer(cb);
       if (p) {
        cout << " A" << LocalSize(p) << "/" << cb;
        v.push_back(p);
       }
      } else { // free
       int victim = rand() % v.size(); // choose one at random
       cout << " F" << LocalSize(v[victim]);
       b.ReturnBuffer(v[victim]); // free it
       v[victim] = v.back();
       v.pop_back();
      }
     }
    }
    

    This short program randomly allocates and frees memory from the buffer cache, printing (rather cryptically) the size of the blocks allocated and freed. When memory is allocated, it prints "A1/2" where "1" is the size of the block actually allocated and "2" is the size requested. When freeing memory, it prints "F3" where "3" is the size of the block allocated. Run this program, let it do its thing for maybe ten, fifteen seconds, then pause the output and study it. I'll wait. If you're too lazy to actually compile and run the program, I've included some sample output for you to study:

    F102400 A102400/400 F800 F200 A800/100 A200/200 A400/400
    A400/400 A200/200 F1600 A1600/100 F100 F800 F25600 A25600/200
    F12800 A12800/200 F200 F400 A400/100 F200 A200/100 A200/200
    A100/100 F200 F3200 A3200/400 A200/200 F51200 F800 F25600
    F1600 F1600 A51200/100 F100 A100/100 F3200 F200 F409600 F100
    A409600/400 A100/100 F200 F3200 A3200/800 A400/400 F800 F3200
    F200 F12800 A12800/200 A100/100 F200 F25600 F400 F6400
    A25600/100 F100 F200 F400 F200 F800 F400 A800/800 A100/100
    

    Still waiting.

    Okay, maybe you don't see it. Let's make the effect even more obvious by printing some statistics periodically. Of course, to generate the statistics, we need to keep track of them, so we'll have to remember how big the requested buffer was (which we'll do in the buffer itself):

    int __cdecl main(int argc, char **argv)
    {
     BufferCache b;
    
     // seeding the random number generator is not important here
    
     vector<void *> v; // keeps track of allocated memory
     SIZE_T cbAlloc = 0, cbNeeded = 0;
     for (int count = 0; ; count++) {
      // randomly allocate and free
      if (v.size() == 0 || (rand() & 1)) { // allocate
       SIZE_T cb = 100;
       while (cb < 1024 * 1024 && !(rand() % 4)) {
        cb *= 2; // exponential decay distribution up to 1MB
       }
       void* p = b.GetBuffer(cb);
       if (p) {
        *(SIZE_T*)p = cb;
        cbAlloc += LocalSize(p);
        cbNeeded += cb;
        v.push_back(p);
       }
      } else { // free
       int victim = rand() % v.size(); // choose one at random
       cbAlloc -= LocalSize(v[victim]);
       cbNeeded -= *(SIZE_T*)v[victim];
       b.ReturnBuffer(v[victim]); // free it
       v[victim] = v.back();
       v.pop_back();
      }
      if (count % 100 == 0) {
       cout << count << ": " << v.size() << " buffers, "
            << cbNeeded << "/" << cbAlloc << "="
            << cbNeeded * 100.0 / cbAlloc << "% used" << endl;
      }
     }
    }
    

    This new version keeps track of how many bytes were allocated as opposed to how many were actually needed, and prints a summary of those statistics every hundred allocations. Since I know you aren't actually going to run it yourself, I've run it for you. Here is some sample output:

    0: 1 buffers, 400/400=100% used
    100: 7 buffers, 4300/106600=4.03377% used
    200: 5 buffers, 1800/103800=1.7341% used
    300: 19 buffers, 9800/115800=8.46287% used
    400: 13 buffers, 5100/114000=4.47368% used
    500: 7 buffers, 2500/28100=8.8968% used
    ...
    37200: 65 buffers, 129000/2097100=6.15135% used
    37300: 55 buffers, 18100/2031400=0.891011% used
    37400: 35 buffers, 10400/2015800=0.515924% used
    37500: 43 buffers, 10700/1869100=0.572468% used
    37600: 49 buffers, 17200/1874000=0.917823% used
    37700: 75 buffers, 26000/1889900=1.37573% used
    37800: 89 buffers, 30300/1903100=1.59214% used
    37900: 91 buffers, 29600/1911900=1.5482% used
    

    By this point, the problem should be obvious: We're wasting insane quantities of memory. For example, after step 37900, we've allocated 1.8MB of memory when we needed only 30KB, for a waste of over 98%.

    How did we go horribly wrong?

    Recall that most of the time, the buffer being allocated is a small buffer, and most of the time, a small buffer is freed. But it's the rare case of a large buffer that messes up everything. The first time a large buffer is requested, it can't come from the cache, since the cache has only small buffers, so it must be allocated. And when it is returned, it is kept, since the cache keeps the largest buffer.

    The next allocation comes in, and it's probably one of the common-case small buffers, and it is given the cached buffer—which is big. You're wasting a big buffer on something that needs only 100 bytes. Some time later, another rare big buffer request comes in, and since that other big buffer got wasted on a small allocation, you have to allocate a new big buffer. You allocated two big buffers even though you need only one. Since big buffers are rare, it is unlikely that a big buffer will be given to a caller that actually needs a big buffer; it is much more likely to be given to a caller that needs a small buffer.

    Bad effect 1: Big buffers get wasted on small callers.

    Notice that once a big buffer enters the system, it is hard to get rid of, since a returned big buffer will be compared against what is likely to be a small buffer, and the small buffer will lose.

    Bad effect 2: Big buffers rarely go away.

    The only way a big buffer can get freed is if the buffer in the cache is itself already a big buffer. If instead of a one-entry cache like we have here, you keep, say, ten buffers in your buffer cache, then in order to free a big buffer, you have to have eleven consecutive ReturnBuffer calls, all of which pass a big buffer.

    Bad effect 3: The more efficient you try to make your cache, the more wasteful it gets!

    What's more, when that eleventh call to ReturnBuffer is made with a big buffer, it is only the smallest of the big buffers that gets freed. The biggest buffers stay.

    Bad effect 4: When a big buffer does go away, it's only because you are keeping an even bigger buffer!
    Corollary: The biggest buffer never gets freed.

    What started out as an "obvious" decision in choosing which buffer to keep has turned into a performance disaster. By favoring big buffers, you allowed them to "poison" the cache, and the longer you let the system run, the more allocations end up being big "poisoned" buffers. It doesn't matter how rare those big blocks are; you will eventually end up in this state. It's just a matter of time.

    When the performance team tries to explain this problem to people, many of them get the mistaken impression that the problem is merely that there is wasted space in the cache. But look at our example: Our cache has only one entry and we are still wasting over 90% of the memory. That's because the waste is not in the memory being held by the cache, but rather is in the memory that the cache hands out. (It's sort of like that scene in It's a Wonderful Life where George Bailey is explaining where all the money is. It's not in the bank; it's in all the places that got money from the bank.)

    My recommendations:

    • Instrument your cache and understand what your program's memory allocation patterns are.
    • Use that information to pick a size cutoff point beyond which you simply will not use the cache at all. This ensures that big buffers never get into the cache in the first place. Choosing this cutoff point is usually extremely easy once you look at then allocation histogram.
    • Although you've taken the big buffers out of the picture, you will still have the problem that the small buffers will gradually grow up to your cutoff size. (I.e., you still have the same problem, just in miniature.) Therefore, if the cache is full, you should just free the most recently returned buffer regardless of its size.
    • Do not use the cached buffer if the waste is too great. You might decide to use multiple "buckets" of cached entries, say one for buffers below 100 bytes, another for buffers between 100 and 200 bytes, and so on. That way, the waste per allocation is never more than 100 bytes.
    • Finally, reinstrument your cache to ensure that you're not suffering from yet some other pathological behavior that I haven't taken into account.

    Here's a new ReturnBuffer implementation that takes some of the above advice into account. Instrumentation shows that three quarters of the allocations are in the 100–200 byte range, so let's cap our cache at 200 bytes.

    void BufferCache::ReturnBuffer(void *p)
    {
     if (m_pCache == NULL && LocalSize(p) <= 200) {
      m_pCache = p;
     } else {
      LocalFree(p);
     }
    }
    

    With this one seemingly-minor change, our efficiency stays above 90% and occasionally even gets close to 100%:

    0: 1 buffers, 400/400=100% used
    100: 7 buffers, 4300/4400=97.7273% used
    200: 5 buffers, 1800/1800=100% used
    300: 19 buffers, 9800/9800=100% used
    400: 13 buffers, 5100/5100=100% used
    500: 7 buffers, 2500/2600=96.1538% used
    ...
    37200: 65 buffers, 129000/130100=99.1545% used
    37300: 55 buffers, 18100/18700=96.7914% used
    37400: 35 buffers, 10400/11000=94.5455% used
    37500: 43 buffers, 10700/11000=97.2727% used
    37600: 49 buffers, 17200/18000=95.5556% used
    37700: 75 buffers, 26000/26800=97.0149% used
    37800: 89 buffers, 30300/31900=94.9843% used
    37900: 91 buffers, 29600/30600=96.732% used
    

    Don't forget to check out performance guru Rico Mariani's reminder that Caching implies Policy. As he explained to me, "Cache policy is everything so you must be dead certain that your policy is working as you intended. A cache with a bad policy is another name for a memory leak."

Page 1 of 4 (35 items) 1234