Cover for C++ AMPWe are pleased to announce the new book C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++ by Kate Gregory and Ade Miller. C++ AMP lets you capitalize on the fast GPU processors in today’s computers through the C++ AMP code library, bringing massive parallelism to your project. Experienced C++ developers will learn parallel programming fundamentals with C++ AMP through detailed examples, code snippets, and case studies.

The case studies include:

· An “N-body” case study that uses several different implementations of the classic n-body problem, which models particle movement under gravity, intended to show you how to use C++ AMP to get the most out of your GPU hardware in a computational application

· A “Cartoonizer” case study that demonstrates braided parallelism, using both the available cores on the CPU and any available GPU(s). This project processes video into simpler “cartoonized” images, using two different approaches to solve the problem.

· A “Reduction” case study that shows twelve different implementations of the reduce algorithm. The book shows and discusses each implementation’s performance characteristics and the trade-offs associated with each.

You’ll discover how to:

· Gain huge code performance improvement using graphics processing units (GPUs)

· Choose accelerators that enable you to write code for GPUs

· Program code using the Microsoft DirectX platform

· Apply thread tiles, tile barriers, and tile static memory

· Debug C++ AMP code with Microsoft Visual Studio

· Use profiling tools to track the performance of your code

The full table of contents appears below. You can purchase the book from O’Reilly here:, and find more information on CodePlex: Author Kate Gregory also maintains a page for this book:

Chapter 1 : Overview and C++ AMP Approach

Why GPGPU? What Is Heterogeneous Computing?
Technologies for CPU Parallelism
The C++ AMP Approach

Chapter 2 NBody Case Study

Prerequisites for Running the Example
Running the NBody Sample
Structure of the Example
CPU Calculations

Chapter 3 C++ AMP Fundamentals

array<T, N>
accelerator and accelerator_view
array_view<T, N>
Functions Marked with restrict(amp)
Copying between CPU and GPU
Math Library Functions

Chapter 4 Tiling

Purpose and Benefit of Tiling
tile_static Memory
tiled_index<N1, N2, N3>
Modifying a Simple Algorithm into a Tiled One
Effects of Tile Size
Choosing Tile Size

Chapter 5 Tiled NBody Case Study

How Much Does Tiling Boost Performance for NBody?
Tiling the n-body Algorithm
Using the Concurrency Visualizer
Choosing Tile Size

Chapter 6 Debugging

First Steps
GPU Debugging Basics
Seeing Threads
Taking More Control

Chapter 7 Optimization

An Approach to Performance Optimization
Analyzing Performance
Optimizing Memory Access Patterns
Optimizing Computation

Chapter 8 Performance Case Study—Reduction

The Problem
Case Study Structure
CPU Algorithms
C++ AMP Algorithms

Chapter 9 Working with Multiple Accelerators

Choosing Accelerators
Using More Than One GPU
Swapping Data among Accelerators
Dynamic Load Balancing
Braided Parallelism
Falling Back to the CPU

Chapter 10 Cartoonizer Case Study

Running the Sample
Structure of the Sample
The Pipeline
The Pipeline Cartoonizing Stage
Using Multiple C++ AMP Accelerators
Cartoonizer Performance

Chapter 11 Graphics Interop

Using Textures and Short Vectors
HLSL Intrinsic Functions
DirectX Interop

Chapter 12 Tips, Tricks, and Best Practices

Dealing with Tile Size Mismatches
Initializing Arrays
Function Objects vs. Lambdas
Atomic Operations
Additional C++ AMP Features on Windows 8
Time-Out Detection and Recovery
Double-Precision Support
Debugging on Windows 7
Additional Debugging Functions
C++ AMP and Windows 8 Windows Store Apps
Using C++ AMP from Managed Code

Appendix Other Resources

More from the Authors
Microsoft Online Resources
Download C++ AMP Guides
Code and Support