Who has ever met that guy that insisted on writing every piece of code to be as fast as possible? We'll call him the "speed freak". Thankfully, speed freak is less an influence in languages such as Java or C# because there is no low level access to the CPU. Every instruction you write gets interpreted by the run time environment which means you pretty much at the mercy of the compiler and run time efficiency; it's very hard to tweak the code past the high level structures.

But in C or C++, speed freak can be a nightmare. It can be hard to argue against someone saying "If something can be written faster, then why not do it?" It makes perfect sense, we all want programs to be faster. If tweaking the code makes it even 1% faster, is that 1% more time the user has?

Nope. At least, most of the time it is not. Ignoring the risk of making the change, optimization is not usually necessary for two reasons:

  • The code you are working on runs infrequently so the user will never notice the difference.
  • Your change does not make it any faster (and probably makes it worse).

And yet, the speed freak will rebuttal with a variety of arguments.

We should always write faster code!

If we focused all our time on making sure every code path we wrote was as fast as possible, then we would have wasted a great deal more time than the customer would potentially have saved with the speed improvements. The reason is that programs usually don't need to be optimized in every single place. For example, why spend hours and hours optimizing a feature whose use is exceedingly rare? If a feature already exists, chances are you will break some portion of its functionality by trying to optimize it, and if it already exists in the product and no one has complained about its speed, why spend time (which is the company's money to pay you) to improve it? If the customer experiences no noticeable benefit, then you have poorly invested your resources.

Customers are complaining that our product is slow so we need to fix this part of the code!

Customers may complain that a feature is slow and it is probably to your benefit to make it faster. But, do you really know why the feature is slow? The answer is that you probably have no clue. Even the best programmers will be hard pressed to pick out the slow points in a million line program. Things that may seem like the obvious point of slowness could very well be or could be way off target. Here is where precision counts: the right choice and the benefits could be great, but the wrong one and there will be no change.

This is why profilers exist. A profiler will let you see which portions of your program are getting used while you do something. It should give you which functions had the most time spent in it and note that this is not always the function that gets called the most. I have found it is usually that innocuous looking system call that ends up requiring a disk read which means you just spent a couple milliseconds (read millions of CPU cycles) getting some information off of the hard drive.

My code is faster!

Really? Really? How do you know? Are you basing that on the compiler generates less instructions or that you perceive it to take less time to run? You can settle this by profiling the original code and the new code in succession and see what the difference is. If there is a notable, 50% difference, chances are you have nailed it on the head. If there's a meager 5%, run the test a few times to make sure you actually have done something and not a victim of some random anomaly.

Number of CPU cycles is only one measure and misses some other very important factors. Are you sure this will be faster on ALL target processors? AMD and Intel are both x86, but they are built completely differently.

To illustrate the subtleness here, consider the difference between x86 and x64.

static DataClass staticData[100] = {};
unsigned int arrayToData[10];
Initialize( arrayToData ); // Fill arrayToData with indexes into staticData
for( int i = 0; i < 10; i++ )
  
DoSomething( staticData[ arrayToData[ i ] ] );

So, why store an index in arrayToData? Would it not be faster to store a pointer to the data and never have to access the array in the loop?

static DataClass staticData[100] = {};
DataClass* pointerToData[10];
Initialize( pointerToData ); // Fill pointerToData with pointers into staticData
for( int i = 0; i < 10; i++ )
  
DoSomething( *pointerToData[ i ] );

Hooray! We've just made the program slightly faster…on x86. As soon as you try to migrate this to x64 you may just lose your great work. On x64, pointers take 8 bytes to store, so now pointerToData takes up twice as much room, potentially causing the processor cache to be unable to hold as much information and your program responds slower. You have just sacrificed cycles for memory and you really have accomplished nothing. This is the danger of micro-optimizing without considering the whole picture.

Did you consider branch prediction, cache size, page faults, memory alignment, etc?

Software is the most complicated manmade structures on Earth, and its optimization is a black art. Just make sure you are aware of the potential complications your micro-optimizations may cause.

Write Macros!

Macros are just advanced find and replace and are like using duct tape to repair an airliner. It may work initially, but they are hard to maintain and will likely break later on. Inlined functions have many advantages over macros equivalents, including being easier to write and maintain while still offering the same code-expansion functionality. Macros have their place, but they should not be used to try and improve run time speed. The following two snippets do the exact same thing but one is easier to write, read, maintain and debug into:

#define POW2( x ) ((x)*(x))

__inline double pow2( double x ) {
   return x*x;
}

Write it an Assembly!

Yes, you can achieve faster code by writing it in assembly, but I doubt you'll be able to beat the compiler and manage the same productivity as the guy next door who is still writing in C++. Another fact about assembly is that just because you can write it does not mean you are doing it very well. I don't mean functionally, your assembly code will do the exact same thing that the C source code does. But writing assembly well takes a long time to learn. You need to understand how a processor works internally and you need to understand it exceedingly well to justify using assembly. Optimizing for a specific processor's branch prediction and organizing code to allow for pipelining while managing value dependencies and organizing instructions to fit best into "very long instruction words" is a logistical nightmare on large applications. When speed is not the 100% highest priority for this section of code, let the compiler do the heavy lifting for you.

Finally, writing anything in assembly usually means writing it in C as well and then #ifdef-ing around it depending on which processor flavor you are building for. It may be faster, but is it also worth maintaining two or more code paths that (are supposed to) do the same thing?

Love the Compiler

The compiler is smarter and more patient than you when it comes to micro-optimization. It will micro-optimize your code everywhere, converting all those 2*x to x+x and x = 0 to x = x^x. It comes back to learn to love the wheel, the tool is given to you and will do the job well 99% of the time, don't spend all your time trying to beat it until you know you have to.