Parallel STL - Democratizing Parallelism in C++

Parallel STL - Democratizing Parallelism in C++

Rate This
  • Comments 16

Only a few years ago, writing parallel code in C++ was a domain of the experts. Nowadays, this field is becoming more and more accessible to regular developers thanks to the advances in libraries, such as the PPL and C++ AMP from Microsoft, Intel's Threading Building Blocks, OpenMP or OpenACC if you prefer a pragma-style approach, OpenCL for low-level access to heterogeneous hardware, CUDA and Thrust for programming NVidia devices, and so on.

The C++ Standard is catching up too, giving us the fundamentals such as the precisely defined memory model, and the basic primitives like threads, mutexes, and condition variables. This allows us to understand how atomics, fences and threads interact with the underlying hardware, reason about data races and so on. This is extremely important, but now we also need higher level algorithms that are commonly found in many of the popular parallel libraries.

Over the last few years, a group of software engineers from Intel, Microsoft and NVidia have worked together on a proposal for the ISO C++ Standard known as the "Parallel STL".

This proposal builds on the experience of these three companies building parallel libraries for their platforms -- the Threading Building Blocks (Intel), PPL and C++ AMP (Microsoft) and Thrust (NVidia). All these libraries have a common trait -- they allow developers to perform common parallel operations on generic containers. Naturally, this aligns very well with the goals of the C++ Standard Template Library.

All three companies are working on their implementations of the proposal. Today, we're pleased to announce that Microsoft has made the prototype of the proposal available as an open source project at

We encourage everyone to head over to our CodePlex site and check it out.

The proposal has been approved to be the foundation for the "Parallelism Technical Specification" by the ISO C++ Standards Committee meaning that enough people on the Committee are interested in incorporating this proposal into the next major version of the C++ Standard. Needless to say, this set of people includes the representatives of Intel, Microsoft and NVidia, all of which are active members of the Committee.

For those familiar with the STL, using Parallel STL should be easy. Consider an example of sorting a container data using the STL function std::sort:

sort(data.begin(), data.end());

Parallelizing this code is as easy as adding the parallel execution policy as the first parameter to the call:

sort(par, data.begin(), data.end());

Obviously, there is a little more to it than meets the eye. The parallel version of sort, and the execution policy are defined in a separate namespace std::experimental::parallel, so you will need to either use it explicitly or via a using directive (it is expected that the names in this namespace will be promoted to std once this becomes part of the Standard C++).

As is always the case with parallelization, not every program will benefit from using the Parallel STL, so don't just go sprinkling your STL code with par willy-nilly. You still need to find a bottleneck in your program that's worth parallelizing. In some cases, your program will need to be rewritten tobecome amenable to parallelism.

Where do we go from here?

As mentioned above, the project is still experimental. While the effort is driven by three major companies, and there is strong interest from the ISO C++ Committee and the C++ community in general, we still have ways to go before Parallel STL becomes part of the C++ Standard. We expect that the draft will undergo changes during the standardization process, so keep this in mind when working with the prototype.

Your feedback is important, and there is a number of ways to get engaged. You can leave a comment below, send email to or head over to and start a discussion.

Artur Laksberg
Visual C++ TeamMicrosoft

Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post
  • Great news, looking forward to playing with the code!

    BTW, I just have to ask -- does `#include <meow>` in N3960 imply STL's (the person) involvement? :-)

  • That wasn't my doing, although I wholeheartedly approve :->

  • We don't need more STL garbage to write parallel code.  If you want to convert C++ into a YAPTL (yet another pizza-topper's language) do the decent thing and call it something else.  It may be good for selling books and self-promotion tours but C++ is fast becoming totally useless as an everyday language that gets the job done quickly and accurately with the minimum of fuss.  It should not be a perpetual relearning exercise based on the whims and fancies of the sloppy and untidy who inhabit the ivory towers of academia.  There is no need for any of this crud.  Returning to writing OOP in pure C, you guys are getting way out of hand.

  • @Bondi Beach I guess you must be joking. I have been using C++ since 2002 uninterrupted and the additions in C++11 and C++14 are invaluable. This is a step towards getting the most out of the machine, as C++ philosophy states. I don't see in which way this would make C++ *less* useful.

    P.S.: I still have a hope that modules will be in in 2017. I checked clang modules and work seems to go too slow :(

  • @German Diago  No I am not joking.  I have been using C++ since the year dot and C before that when it was still Lattice before Microsoft's came along.  Of late it has migrated into a useless pattern driven monstrosity that has left the tenets of decent programming far behind.  You are welcome to it.  Going back to C as stated.  You can keep your pizza-topper abstraction with its template driven cascade errors and 'type safety' in the minds eye only, boost and all the rest.  Good luck with that!  In a few years time it'll all be as redundent as ATL and MFC and you'll be starting all over again with some new paradigm from hell.  Keep It Simple Stupid.

  • At first glance, without thinking too deeply: why should I select parallel or regular functionality? As a programmer I prefer my sort to finish as quick as possible. For example, most implementations today has at least 3 different algorithms for sorting, depends on various checks to the data itself before the sorting begins. It means that as a programmer you are not required to call 'QuickSort' or 'BubbleSort' or any other specific algorithm, it is the sort() function that does the selection for the best implementation for you. I think that if this methodology can be kept with enhanced hardwre abilities, why should the programmer chose? Or, in other words, what inputs do the programmer have to choose between different implementations? Isn't it the best that the algorithm itself, while exploring the aviliable resources choose the best implementation?

  • @Ran: There is definitely a need for selecting parallel or serial functionality.

    * STL may be called in an inner loop which is already parallelized.

    * The comparision function may not be thread-safe.

  • @petter S: as always, I don't want to limit the options, I just don't want to choose when someone else can do it better for me. E.g. The best sort algorithm.

    @Bondi Beach: me too agree with you that there are few things that do make people go back to the

    old, c style programming. For instance the extensive use of function<>

  • Although not an active proposal, I think it's unfortunate that `sort(par, v.begin(), v.end())` conflicts with something like "uniform function call syntax" (UFCS) that is in the D programming language. For example, with the range library on the way, `sort(v.begin(), v.end()` becomes `sort(v)`. If you then introduce UFCS to the language, you could naturally call it like `v.sort()`. But with the parallel sort, `sort(par, v)` would not be callable using UFCS. If, however, the `par` parameter comes after the range parameters, then it is compatible and the user can call either `v.sort()` or `v.sort(par)`.

  • @STL, is there a chance we get C99's snprintf in VS this year? I saw you mentioned in comments back in Jul 2013 that you guys ran out of time: Please share some sort of ETA.


  • James McNellis has implemented snprintf() and it will ship in the next major version of VC. (It will not appear in any Updates to VC 2013.) Note that I'm not allowed to talk about release dates before they've been publicly announced.

  • We are using OMPTL (Parallel STL algorithms over OpenMP) since 2009, it looks much better so far

  • Is there any possibility to place par parameter to Container (as a template parameter or member) while not in algorithms. That's easier to update exists codes, because the number of Container templates is less than algorithms.

  • @leolee82: the policy really needs to apply to the algorithm, not the container. The same container can be processed sequentially or in parallel, depending on the algorithm.

  • @Ran.  Check this out for a laugh ...

    STL is a pack of cheap diapers too many.

Page 1 of 2 (16 items) 12