Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Synchronizing array_view in C++ AMP

Synchronizing array_view in C++ AMP

  • Comments 0

Hi there, in a previous blog post I talked about the data transfer APIs provided by C++ AMP. In this blog post I will be talking about some interesting scenarios involving data transfer while working with array_view. I will describe various operations on array_view that can trigger implicit data transfer between accelerators, dependent on where the array_view is being used.

In the following code snippet, we have a std::vector on host memory and we wrap it with an array_view. We capture the array_view in a parallel_for_each and modify it on the accelerator.

1.       // Create a vector of size 10 and with elements set to 5

2.      std::vector<int> data(10, 5);

3.      array_view<int, 1> my_av(10, data);

 

4.     // Gets the handle to an accelerator

5.      accelerator gpuDevice = GetGpuDevice();

 

6.     //Modify the array_view on accelerator

7.      parallel_for_each(gpuDevice.default_view, my_av.extent, [=] (index<1> idx) restrict(amp) {

8.             my_av[idx] = my_av[idx] + 2;

9.     });

Prior to the parallel_for_each call on line 7 the array_view data is only present on the host; hence the data needs to be transferred to the target accelerator_view when parallel_for_each is invoked. The general rule is that if the target accelerator_view does not contain the latest copy of data for array_view, an implicit copy will take place from accelerator_view which contains latest copy of data to the target accelerator_view.

You may try to access the underlying data on the host after the parallel_for_each computation has completed, you will see that it possibly contains the old value.

std::cout << data[0]; // prints 5

However, this is not guaranteed and hence it is not advisable to rely on it. The underlying data source that an array_view instance wraps is guaranteed to be updated on the host when one of the following actions occurs (which actually triggers transfer of latest data from accelerator memory to host memory):

A) Calling synchronize or synchronize_async on the array_view. We have synchronize_async function for the same reasons that we have copy_async functions. synchronize_async is an intriguing name that sounds like an apparent oxymoron but it gives the user an alternative to blocking the current thread while waiting for synchronization to complete.

my_av.synchronize();

// or you can do it asynchronously
completion_future w = myArrayView.synchronize_async();

/***** Do some other operation here ******/

// wait for synchronization to complete
w.wait();

B) Accessing the array_view through indexing either as an L-value or R-value.

int x = my_av [idx];    //blocks until all of the data is copied from the accelerator

C) Calling data() member function only available on one-dimensional array_views.

my_av.data();

D) A given data can be wrapped by multiple array_view objects. When the last copy of array_view wrapping the underlying data is destroyed (either when it goes out of scope or when the destructor is explicitly called), data on the accelerator memory will be copied out to update the host data. Relying on this is discouraged, since any exceptions thrown at the synchronization point would not be observed by user code.

I hope you now have a better understanding of how does array_view synchronizes its underlying data across accelerators and possible uses of array_view that can trigger it. As usual, I would love to read your comments below or in our MSDN forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 2 and 2 and type the answer here:
  • Post