Have you read http://support.microsoft.com/?kbid=307340? If not, I suggest you do so. If you need convincing or simply want to know why this problem occurs I suggest you keep reading.

The problem described in the article above can cause your application to spike in CPU time, Memory usage or both. This will lead to your application hanging or possibly even crashing. The cause of the problem is one of the most basic operations, and is something that we do very frequently.


The problem

The problem in a nutshell is that string concatenation is bad. I don't mean bad as in "You shouldn't eat that extra piece of chocolate". I mean it as in "You really shouldn't put your head in wet concrete". If you've been reading my previous post on memory management you might be able to figure out why string concatenation could be such a potential hazard. Consider the following scenario:

You're getting ready to go grocery shopping. You take a sheet of paper and a pen and begin writing down what you need to buy. The first item that crosses your mind is milk, so you write it down. Now you remember that you're out of bread as well, so you throw away the first sheet of paper, pick out a new one and write down milk and bread. You then throw away this paper too and write down milk, bread and apples on a third piece of paper. You throw away the third piece of paper and repeat until finally the list is done. If you have 20 items on your list you now have 19 discarded papers lying on the floor waiting for the "garbage collector".

This is exactly how regular string concatenation works in .NET. Every time you concatenate a string the framework needs to allocate a new memory segment, large enough to hold the result of the concatenation and store the string in this new segment. The old segment is flagged as ready for garbage collection and as we all know garbage collection costs CPU and pending garbage collection costs memory.

 

A real example

So if you thought that 19 sheets of paper were a waste consider the following piece of code:

private String buildTable()
{
   String sReturn;
   int i, j;
   sReturn = "<table>"
   for (i = 0; i < 35; i++) // Rows
   {
      sReturn += "<tr>"
      for (j = 0; j < 15; j++) // Columns
      {
         sReturn += "<td>"
         sReturn += i.ToString() + " : " + j.ToString();
         sReturn += "</td>"
      }
      sReturn += "</tr>"
   }
   sReturn += "</table>"
   return sReturn;
}

All in all we have 2697 concatenations which means we have 2696 unnecessary allocations. The allocations will be gradually bigger and bigger, so the last ones will be almost as big as the final table. Needless to say this will consume a lot of unnecessary memory. Also, if this table was built for a "real" application it would probably contain a lot more data for each cell, not to mention custom styles and classes. Imagine what would happen when the string goes past the magic 85000 bytes barrier that defines what goes on the large object heap and what doesn't. In a larger scale this would have a severe impact on performance.

 

This is a really weird behavior, how can it be?!

It's not that odd really. If we extend the grocery shopping analogy a little we can see why there's a logical reason for this: You know that you want to write something down. You have a limited amount of paper, so what do you do? You cut out a piece of paper just large enough for you to write what you need. When you find that you need to add another item to the list you'll have to discard the old piece and cut out a new one. Not very practical, agreed, but you're trying to minimize your paper usage.

As the clever beings that we are we normally make a rough estimate of how big a piece of paper we need and then start writing. The framework, however, has no way of estimating how big a string is going to be. It doesn't know if you're going to concatenate it zero or ten thousand time, so that's something you'll have to estimate yourself.

 

So what should I do instead?

One word: StringBuilder

The StringBuilder class is made especially for this type of operations. When using the StringBuilder class our sample above would look something like this

private String buildTableSB()
{
   StringBuilder sReturn = new StringBuilder();
   sReturn.Capacity = 8000; // This is optional, but may be very efficient
   int i, j;
   sReturn.Append("<table>");
   for (i = 0; i < 35; i++) // Rows
   {
      sReturn.Append("<tr>");
      for (j = 0; j < 15; j++) // Columns
      {
         sReturn.Append("<td>");
         sReturn.Append(i.ToString());
         sReturn.Append(" : ");
         sReturn.Append(j.ToString());
         sReturn.Append("</td>");
      }
      sReturn.Append("</tr>");
   }
   sReturn.Append("</table>");
   return sReturn.ToString;
}

The difference in performance for a larger operation is enormous. I highly recommend trying out these samples. Copy the functions to a winforms/webforms application, increase the number of rows to 500 and run both versions. The StringBuilder version will finish within milliseconds, and the "classic" version will not... For additional tidbits, take a look at the GC counter in performance monitor and look at how many GCs you have to make to complete the operation.

 

Bye for now / Johan