The 64-bit versions of Windows 7 and Windows Server 2008 R2 support more than 64 Logical Processors (LP) on a single computer. New processors are now appearing that leverage non-uniform memory access (NUMA) architectures. Within the near future, a system with 4 CPU sockets, 8 processor-cores per socket and with Simultaneious Multi-Threading (SMT) enabled per core, will achieve 64 Logical Processors. Many server-class solutions will need to be architected with NUMA awareness in order to achieve linear performance scaling on 64+ LP systems.

Scalable application design requires NUMA awareness from several perspectives. Herb Sutter describes this process as "Maximize Locality, Minimize Contention". Imagine the processor load required to service interrupts from modern 10 Gb/sec network cards, for example. Ideally, the interrupt processing and any Deferred Procedure Calls (DPC) occur local to the network device. Read a detailed analysis by Windows performance expert Mark Friedman. NUMA locality may be applied to processes, threads, devices, interrupts, and memory.

Read more about this topic and download example code at MSDN Code Gallery...