Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Random Traveler Sample in C++ AMP

Random Traveler Sample in C++ AMP

In this blog post I am sharing a sample named Random Traveler, which demonstrates a graph based algorithm.

Using the sample

The random traveler starts at an airport, takes an outgoing flight, reaches another airport, again takes an outgoing flight from this airport and so on. The sample solves the following problem: given the starting airport (that you specify in the Starting Airport textbox) and the number of flights the traveler takes (that you specify in the Number of Flights textbox), and a geographic region (that you specify in the Map option), find the cities where the traveler is most likely to be at the end of journey. The calculation can take place serially on the CPU, using multiple CPU cores through TPL, or using C++ AMP.

The following screenshot shows the application when you first start it:

On clicking the ‘Calculate’ button, the computation is kicked off and results are displayed, as shown in the following screenshots.

The box at the top of the window shows the results; list of the cities in order of decreasing probability. The map at the bottom shows red circles on top of cities. The size of the circle is proportional to the probability associated with city. Cities with smaller probability will have a smaller circle.

The code

The sample is mostly written in C# and uses C++ AMP for bulk of computation to gain speed up. It comprises of a graph with each node representing an airport and each edge representing a flight connection between two airports. Each edge has a weight associated with it which represents the frequency of flight.

The sample uses XAML based WPF in C# for GUI. When the application first starts, the graph is initialized by loading data from file (see function LoadData() in file MainWindow.xaml.cs under folder RandomTraveler\RandomTraveler). When the ‘Calculate’ button is clicked, the sample first computes the largest strongly connected component (SCC) containing the starting airport (see function LargestStronglyConnectedComponent() in file FlightGraph.cs under folder RandomTraveler\RandomTraveler).

The matrix representation of this SCC is then computed with each edge having weight relative to peer edges (i.e. the outgoing edges from a given node). Relative weight is calculated by dividing the original weight of the edge by the sum of weights of the peer edges (see function GenerateMatrix() in file FlightGraph.cs under folder RandomTraveler\RandomTraveler).

The Exponentiation of this matrix is then calculated with an exponent equal to the number of flights taken (see function MatrixExp() in file MatrixOperations.cs under folder RandomTraveler\RandomTraveler). The exponentiation of the matrix is computed using the matrix multiplication as the core computational algorithm. The CPU version of matrix multiplication (both single threaded and TPL) is implemented in C# (see function MultiplyMatrixCpu() and MultiplyMatrixTpl() in file MatrixOperations.cs under folder RandomTraveler\RandomTraveler). The accelerated or GPU version of matrix multiplication is implemented using C++ AMP (see file MatrixOperationsCppAmp.cpp under folder RandomTraveler\MatrixOperationsCppAmp). The resulting matrix after exponentiation contains the probabilities for each airport which is then used to update the UI.

To achieve C#/C++ interop, the sample uses P/Invoke technique to call from C# into C++ AMP native dll containing the matrix multiplication code. You can read about P/Invoke and see a simpler sample in detail on this blog post.

This sample also bubbles up the exceptions occurred in native C++ AMP code by marshaling the exception message from native to managed code and then re-throwing the exception on managed side. The C++ AMP function returns char* which points to the buffer containing exception message. The P/Invoke wrapper in C# code returns a string. In this case, the CLR interop automatically attempts to free the returned pointer using CoTaskMemFree() function, hence the buffer in C++ AMP code in allocated using CoTaskMemAlloc() function. You can read in detail about data marshaling with P/Invoke on MSDN.

You can download the zipped file containing the sample and play with it. Please note that the attached sample code in the “.zip” is released under MS-RSL license. As always, you will need Visual Studio 2012 to build the sample. I would love to hear your comments below.

Attachment: RandomTraveler.zip
Blog - Comment List MSDN TechNet