People who can measure their application data's locality of reference using the CPU'S L2 cache scare me :)

Personally, I find the biggest single performance improvement tactic is KISS (Keep It Simple Stupid) which includes things such as using existing .NET library classes instead of writing your own (I am constantly suprised by how many people rewrite things just because they aren't familiar with the .NET BCL), and aiming to write the least amount of code possible- more code means more maintenance, more debugging, often also more CPU cycles in execution.  Finding the simplest solution takes a bit more effort (since simple != obvious) but usually results in a smaller, faster and more stable system.   And even if you still find yourself in CLRProfiler, at least you may have shortened your stay there!

Oh yeah, there is a new release of the CLRProfiler as well which includes a ~100 page tutorial. Coolness!