Hazim Shafi has written a very relevant case study illustrating how significantly memory usage patterns can affect speedups of parallel applications.  Have you ever parallelized an application only to find marginal (or no) speedup?  Perhaps this entry will explain why.