How to use C++ AMP from C# using WinRT

How to use C++ AMP from C# using WinRT

  • Comments 7

[Updated 5/17/2012 for Visual Studio 11 Beta]

In a previous article, How to use C++ AMP to C#, we described how you can use P/Invoke to call into C++ AMP and accelerate your C# apps on GPUs and other heterogeneous hardware. In this post, we'll take a look at how the same task becomes easier in Windows 8 using WinRT.

Before attempting to call C++ AMP from C#, make sure that you have C++ AMP working on your machine. For example, please verify that you can run the C++ AMP "Hello, World" example.

The short story

Once you have C++ AMP working on your machine, the easiest way to start using it from C# via WinRT is to open this sample project in Visual Studio 11 and begin experimenting with the code.

The long story

If you have an existing Metro style app that you’d like to modify to use C++ AMP – or you’d like to understand how the sample is set up – you can follow the steps below. In summary, you need to take the following steps:

  • Step 1: Open or create a C# Metro style project in Visual Studio 11
    • Choose the platform target as X86 (if you plan to write 32-bit C++ AMP code).
  • Step 2: Create a C++ WinRT Component DLL project
    • DO NOT build the project before completing step 3
  • Step 3: Add the C++ project as a reference to the C# project.
  • Step 4: Write the C++ AMP and the C# code

Step 1: Open or create a C# Metro style project

First, you need to open or create a C# Metro style application project. The rest of the article assumes that the project is named HelloWorldCSharpWinRT:

Also, set the "Platform target" of the project to "X86".

Step 2: Create a C++ WinRT Component DLL project

Now, you can add a Visual C++ WinRT component  that will contain the C++ AMP code. Simply create a project named "HelloWorldLib" from the "WinRT Component DLL" template:

WARNING: Do not build the project yet! Due to a bug in the Visual Studio 11 Developer Preview, building the project before completing step 3 will cause problems, and you may be stuck having to delete and recreate the C++ WinRT project.

Step 3: Add reference from HelloWorldCSharpWinRT to HelloWorldLib

With WinRT, you can simply add HelloWorldLib as a reference to HelloWorldCSharpWinRT. No more manual editing of the csproj file is necessary, as it was with P/Invoke! Just right-click HelloWorldCSharpWinRT, choose "Add Reference..." and select the HelloWorldLib project:

Step 4. Write the C++ AMP and the C# code

Now, we just need to write the C++ AMP code and call it from C#.

Since a C++ AMP kernel may take a long time to execute, the WinRT guidelines state that the kernel should be exposed as an asynchronous operation. A convenient way to expose asynchronous operations in C++ is via create_async, currently available in the PPL Sample Pack (for details on how this works, see Try It Now: Use PPL to Produce Windows 8 Asynchronous Operations).

Delete WinRTComponent.h.

Modify WinRTComponent.cpp as follows:

#include <pch.h>
#include <amp.h>
#include <ppltasks.h>
#include <collection.h>
#include <vector>
using namespace concurrency;
using namespace Windows::Foundation;
using namespace Windows::Foundation::Collections;
namespace HelloWorldLib
    public ref class WinRTComponent sealed
        IAsyncOperation<IVectorView<float>^>^ square_array_async(
            IVectorView<float>^ input)

            // Synchronously copy input data from host to device
            int size = input->Size;
            array<float, 1> *dataPt = new array<float, 1>(
                size, begin(input), end(input));

            // Asynchronously perform the computation on the GPU
            return create_async( [=]() -> IVectorView<float>^
                // Array objects can only be captured by Reference
                array<float,1> &arr = *dataPt;

                // Run the kernel on the GPU
                parallel_for_each(arr.extent, [&arr] (index<1> idx) mutable restrict(amp)
                    arr[idx] = arr[idx] * arr[idx];

                // Copy outputs from device to host
                std::vector<float> vec = std::vector<float>(size);
                copy((*dataPt), vec.begin());
                delete dataPt;

                // Return the outputs as a VectorView<float>
                return ref new Platform::Collections::VectorView<float>(vec);

Notice that the square-array operation is exposed via an asynchronous API. In WinRT, operations that may be long-running should be exposed via asynchronous APIs, and GPU operations may take a relatively long time to execute.

That is all that we need on the C++ side. Now, we'll add a button to the C# project. Modify MainPage.xaml as follows:

<UserControl x:Class="HelloWorldCSharpWinRT.MainPage"
    d:DesignHeight="768" d:DesignWidth="1366">
    <Grid x:Name="LayoutRoot" Background="#FF0C0C0C">
        <Button x:Name="Button_Example" Content="Click"  Click="Button_Example_Click" HorizontalAlignment="Center"/>

When the user clicks the button, we'll call into C++ AMP. Modify MainPage.xaml.cs as follows:

using System;
using System.Collections.Generic;
using Windows.UI.Popups;
using Windows.UI.Xaml;
using HelloWorldLib;

namespace HelloWorldCSharpWinRT

    partial class MainPage
        public MainPage()

ivate async void Button_Example_Click(
            object sender, RoutedEventArgs e)

            Button_Example.IsEnabled = false;
            var arr = new [] { 1.0f, 2.0f, 3.0f, 4.0f };
            List<float> inputs = new List<float>(arr);

IReadOnlyList<float> outputs = 
await new WinRTComponent()

await new MessageDialog(string.Join(",", outputs)).ShowAsync();
            Button_Example.IsEnabled = true;

… and that’s it!

Note that this is a very simple example that demonstrates how to call a C++ AMP function from C#. The example is too naïve to demonstrate speedup – it contains too little work per data element and in total to benefit from GPU acceleration. An example of a workload that does demonstrate speedup is matrix multiplication, and here is a link to C++ AMP code for Matrix Multiplication.


Leave a Comment
  • Please add 6 and 4 and type the answer here:
  • Post
  • Can't you just use Accellerator (MS research)?

  • dr. Venkam, you cannot use Accelerator from MSR in a commercial project (no commercial license). Also C++ AMP offers a more flexible API, an API that is supported in Visual Studio, will be evolved in future releases, has an open specification, and also represents the recommended way forward. That said, if you are happy with Accelerator within those constraints, please keep using it.

  • How is this easier than the example shown in the previous article?  Are there advantages to doing it this way instead?

  • @Doc Sewell  this is targeted for Windows Store apps.  The previous article explains implementation for "Desktop" apps.

  • So using AMP to accelerate processing should be done asynchronously because it's relatively slow? Does anyone else see the problem with that?

  • @Keiren - it's not that it's slower, but depending on the amount of data thrown at it, the parallel processing can take some time....much less time than a single processor if done correctly, but still, some time.

  • I have one huge problem with this solution. I cannot debug my C++ amp kernel this way. The WinRT component (outputting a dll) does not have the debug tab under configuration properties to control the debugger and the store app doesn't have any options for gpu debugging either.

    I can easily enough enable mixed mode debugging and debug from app to component but I cannot make it break or step inside a parallel_for_each.

    Is there a sensible way to set this up?

Page 1 of 1 (7 items)