STL Performance

Visual CPP Team

June 23rd, 20090 0

Hello, I am Mohammad Usman, a Software Design Engineer in Test on the Visual C++ Libraries team. I recently joined the team. In VS 2008 SP1, we invested heavily in our Standard Template Library (STL) implementation by adding TR1 extensions such as shared_ptr, including performance optimizations for things like vectors of shared_ptrs. In this post, I will be talking about how we take this investment further by improving our STL performance in VS2010. We have done some re-writing of our libraries to make them more conformant to the C++0x standard, more flexible and extendible, and to improve performance in a few areas. For the first two items we already have a few blog posts and a Channel 9 video, this post will mainly cover the performance part.

Since we have re-written quite a bit of STL code in order to improve flexibility and increase conformance, it was important to make sure that this does not negatively affect the performance. We have taken special care of that. Most of our performance tests are to ensure this. And our tests are showing good results. In fact there are some areas where VS2010 is faster than VS2008. Some of those areas where we improved performance significantly are:

– Overall deque performance

o The improvement is evident when used with algorithms such as std::reverse(), std::remove(), std::sort() and std::unique().

– std::unique()

o An optimization change in the loops of the function has given a significant performance boost when used with different containers such as list, deque and vector.

– Vector reallocations

o Vector is one of the most commonly used containers because of its attributes such as contiguous memory, random access, fast traversals, and its ability to grow dynamically etc. But it does have some limitations. To maintain contiguous memory, it has to allocate memory in a chunk and when that chunk is full, it has to go through something called “reallocation”. This is a potentially costly process causing the vector to copy all the contents to a new location. Now the good news is that we have made very significant improvements there in VS2010 with the help of rvalue references. This will come into action when you use vectors with STL objects in VS2010 and you can also take advantage of this when using vectors of your own custom objects by writing simple move constructors, move assignment operators for your class. A significant performance gain will be evident here when used with large objects containing strings. This is explained in more detail below.

Vector Reallocation (UDTs – User Defined Types)

Introduction of rvalue references in STL is one of my favorite enhancements for VS2010. “Rvalue references” is quite a complicated topic. Stephan (our STL developer) recently wrote a very good comprehensive blog post explaining rvalue references, which can be found here. For the sake of this post, I will touch only the parts which impact the performance of STL.

By using rvalue references, we are able to move the contents of objects rather than copying and destroying the original where we don’t have to. We actually “steal” the resources by copying the pointers rather than the whole object when the object we are dealing with is an rvalue. This is most helpful when move constructors and move assignment operators can be used. You can use the move machinery to write your move constructors and move assignment operators. The following simple test shows that we can get an order of magnitude performance boost when we take advantage of rvalue references in large UDTs.

#include <iostream>

#include <ostream>

#include <ios>

#include <stddef.h>

#include <string>

#include <vector>

#include <windows.h>

#include <utility>

//***************************** START: <Helper Code> ***********************************

long long counter() {

LARGE_INTEGER li;

QueryPerformanceCounter(&li);

return li.QuadPart;

}

long long frequency() {

LARGE_INTEGER li;

QueryPerformanceFrequency(&li);

return li.QuadPart;

}

// to confine the test to run on a single processor in order to get consistent results for all tests.

void perf_startup() {

SetThreadAffinityMask(GetCurrentThread(), 1);

SetThreadIdealProcessor(GetCurrentThread(), 0);

Sleep(1);

}

//***************************** END: <Helper Code> ***********************************

// User Defined Type

class myUserObject

{

public:

std::string name;

std::string address;

std::string telephone;

std::string name2;

std::string address2;

std::string telephone2;

// default constructor

myUserObject()

{}

// Copy Constructor

myUserObject(const myUserObject& myObj)

{

name = myObj.name;

telephone = myObj.telephone;

address = myObj.address;

name2 = myObj.name2;

telephone2 = myObj.telephone2;

address2 = myObj.address2;

}

// copy assignment operator

myUserObject& operator=(const myUserObject& myObj)

{

name = myObj.name;

telephone = myObj.telephone;

address = myObj.address;

name2 = myObj.name2;

telephone2 = myObj.telephone2;

address2 = myObj.address2;

return *this;

}

// As VS2008 does not have rvalue reference support

#if _MSC_VER >= 1600

// move constructor

// This is where are getting our performance gain.

// The move() machinery is made available in <utility>

myUserObject(myUserObject&& myObj)

{

name = std::move(myObj.name);

telephone = std::move(myObj.telephone);

address = std::move(myObj.address);

name2 = std::move(myObj.name2);

telephone2 = std::move(myObj.telephone2);

address2 = std::move(myObj.address2);

}

// move assignment operator

myUserObject& operator=(myUserObject&& myObj)

{

name = std::move(myObj.name);

telephone = std::move(myObj.telephone);

address = std::move(myObj.address);

name2 = std::move(myObj.name2);

telephone2 = std::move(myObj.telephone2);

address2 = std::move(myObj.address2);

return *this;

}

#endif

};

int main()

{

STL Performance

Visual CPP Team

Read next

0 comments