As blogged by Harry Pierson and Larry Osterman, Herb Sutter has written a great article on how major processor manufacturers and architectures have run out of room with most of their traditional approaches to boosting CPU performance. This is a must read!
As pointed out by Herb, the great growth in CPU clock speeds that we have enjoyed over the last 25–30 years has reached a lull in the last 2 years. In essence the exponential growth in CPU clock speed, known as Moore’s law, has reached a number of a stumbling blocks (such as heat dissipation). With this, the processor manufacturers are focusing on a whole suite of new approaches to increasing the computational power of CPU and the traditional approaches of increasing clock speed is going to be taking a back seat for a while, with the focus moving towards getting more work done per clock cycle.
Herb mentions some of the exciting new innovations that we are going to see in CPU’s in the coming year, some of which I will blog about soon
one of most interesting is multiple cores in a processor die. This and many of the new features of CPU’s have caused some confusion as to how processing power is evaluated. Some of these include
- 64bit CPU’s
- Hyper threading
- multiple cores
These features are sometimes viewed as having a multiplier effect on performance. Take 64bit CPU’s for example, a 64bit processor is not twice as fast as its 32bit equivalent. This maybe surprising, but this is due to the fact that a 64bit CPU has twice as much data to move around! The same also applies to hyper threading and multiple cores. Hyper threading at best can only give you a 10–15%+ performance boost, but that 10–15%+ can be very hard to obtain and with multiple cores, you have some difficulties in getting the performance benefit if your application is not designed to handle its workload concurrently.
One of the areas that probably causes the most confusion is multiple cores in a processor die
Having 2 or more cores in a processor die can easily led to a conclusion that its processing power is a multiple of the clock speed! But this is not the case
. Of all of the features that I have listed, multiple cores is probably the closest to providing this type of performance, but it can only happen if the application is multi-threaded and designed to handle concurrency and parallelism very well. This is where things are going to get very interesting for us as developers and architects! As most applications today are either single threaded or designed to map a logical thread of execution to an OS thread and the focus on CPU throughput per cycle may cause issues for some server applications and whether they fully leverage the CPU’s. This is especially the case where an application is very CPU intensive and leverages a single execution context to perform Batch type work. The reliance on a single execution context (for simplicity) to perform batch work is no longer going to be a prudent approach, as there is no longer a promise of faster clock cycles to get work done quicker, as workload increases.
For years now, silicon has been cheaper than protein! If a server application is sluggish, we tend to go for a (faster) processor upgrade (i.e. silicon) rather than redesign the application (i.e. use protein to solve the problem)! We no longer have this luxury! If we are going to develop truly scalable server applications, we will have to focus on concurrency management more and more in the design and have it factored in from the start. This is especially the case for applications that execute big workloads per request, as a logical execution flow (i.e. a request) will have to be broken down into discrete units and executed in parallel where possible.
So philosophers, grab your chopsticks while you can and when you can!
2005 is going to be a feast of asynchronous programming, concurrency management and multi-threading.