STL Performance

STL Performance

  • Comments 24

Hello, I am Mohammad Usman, a Software Design Engineer in Test on the Visual C++ Libraries team. I recently joined the team. In VS 2008 SP1, we invested heavily in our Standard Template Library (STL) implementation by adding TR1 extensions such as shared_ptr, including performance optimizations for things like vectors of shared_ptrs.  In this post, I will be talking about how we take this investment further by improving our STL performance in VS2010. We have done some re-writing of our libraries to make them more conformant to the C++0x standard, more flexible and extendible, and to improve performance in a few areas. For the first two items we already have a few blog posts and a Channel 9 video, this post will mainly cover the performance part.

Since we have re-written quite a bit of STL code in order to improve flexibility and increase conformance, it was important to make sure that this does not negatively affect the performance. We have taken special care of that. Most of our performance tests are to ensure this. And our tests are showing good results. In fact there are some areas where VS2010 is faster than VS2008. Some of those areas where we improved performance significantly are:

-          Overall deque performance

o   The improvement is evident when used with algorithms such as std::reverse(), std::remove(), std::sort() and std::unique().

-          std::unique() 

o   An optimization change in the loops of the function has given a significant performance boost when used with different containers such as list, deque and vector.

-          Vector reallocations

o   Vector is one of the most commonly used containers because of its attributes such as contiguous memory, random access, fast traversals, and its ability to grow dynamically etc. But it does have some limitations. To maintain contiguous memory, it has to allocate memory in a chunk and when that chunk is full, it has to go through something called “reallocation”. This is a potentially costly process causing the vector to copy all the contents to a new location. Now the good news is that we have made very significant improvements there in VS2010 with the help of rvalue references.  This will come into action when you use vectors with STL objects in VS2010 and you can also take advantage of this when using vectors of your own custom objects by writing simple move constructors, move assignment operators for your class. A significant performance gain will be evident here when used with large objects containing strings. This is explained in more detail below.

Vector Reallocation (UDTs - User Defined Types)

Introduction of rvalue references in STL is one of my favorite enhancements for VS2010. “Rvalue references” is quite a complicated topic. Stephan (our STL developer) recently wrote a very good comprehensive blog post explaining rvalue references, which can be found here. For the sake of this post, I will touch only the parts which impact the performance of STL.

By using rvalue references, we are able to move the contents of objects rather than copying and destroying the original where we don’t have to. We actually “steal” the resources by copying the pointers rather than the whole object when the object we are dealing with is an rvalue. This is most helpful when move constructors and move assignment operators can be used. You can use the move machinery to write your move constructors and move assignment operators. The following simple test shows that we can get an order of magnitude performance boost when we take advantage of rvalue references in large UDTs.

 

#include <iostream>

#include <ostream>

#include <ios>

#include <stddef.h>

#include <string>

#include <vector>

#include <windows.h>

#include <utility>

 

 

//***************************** START: <Helper Code> ***********************************

 

long long counter() {

    LARGE_INTEGER li;

    QueryPerformanceCounter(&li);

    return li.QuadPart;

}

 

long long frequency() {

    LARGE_INTEGER li;

    QueryPerformanceFrequency(&li);

    return li.QuadPart;

}

 

// to confine the test to run on a single processor in order to get consistent results for all tests.

void perf_startup() {

    SetThreadAffinityMask(GetCurrentThread(), 1);

    SetThreadIdealProcessor(GetCurrentThread(), 0);

    Sleep(1);

}

 

 

//***************************** END: <Helper Code> ***********************************

 

// User Defined Type

class myUserObject

{

public:

       std::string name;

       std::string address;

       std::string telephone;

       std::string name2;

       std::string address2;

       std::string telephone2;

 

       // default constructor

       myUserObject()

       {}

      

       // Copy Constructor

       myUserObject(const myUserObject& myObj)

       {

              name = myObj.name;

              telephone = myObj.telephone;

              address = myObj.address;

              name2 = myObj.name2;

              telephone2 = myObj.telephone2;

              address2 = myObj.address2;

       }

 

// copy assignment operator

       myUserObject& operator=(const myUserObject& myObj)

       {

              name = myObj.name;

              telephone = myObj.telephone;

              address = myObj.address;

              name2 = myObj.name2;

              telephone2 = myObj.telephone2;

              address2 = myObj.address2;

             

              return *this;

       }

      

       // As VS2008 does not have rvalue reference support

       #if _MSC_VER >= 1600

      

       // move constructor

// This is where are getting our performance gain.

// The move() machinery is made available in <utility>

       myUserObject(myUserObject&& myObj)

       {

              name = std::move(myObj.name);

              telephone = std::move(myObj.telephone);

              address = std::move(myObj.address);

              name2 = std::move(myObj.name2);

              telephone2 = std::move(myObj.telephone2);

              address2 = std::move(myObj.address2);          

       }

      

       // move assignment operator

       myUserObject& operator=(myUserObject&& myObj)

       {

              name = std::move(myObj.name);

              telephone = std::move(myObj.telephone);

              address = std::move(myObj.address);

              name2 = std::move(myObj.name2);

              telephone2 = std::move(myObj.telephone2);

              address2 = std::move(myObj.address2);

             

              return *this;

       }

 

       #endif

      

};

      

 

int main()

{

       perf_startup();

 

       int total_copy_move_count = 0;

int total_vector_resizes = 0;

double total_pushback_time = 0.0;

 

       std::vector<myUserObject> vec;

       myUserObject obj;

       obj.name = "Stephan T. Lavavej Stephan T. Lavavej Stephan T. Lavavej";

obj.telephone = "314159265 314159265 314159265 314159265 314159265";

       obj.address = "127.0.0.0 127.0.0.0 127.0.0.0 127.0.0.0 127.0.0.0 127.0.0.0";

obj.name2 = "Mohammad Usman. Mohammad Usman. Mohammad Usman. ";

       obj.telephone2 = "1234567890 1234567890 1234567890 1234567890 1234567890";

       obj.address2 = "Republik Of mancunia. Republik Of mancunia Republik Of mancunia";

 

       const long long start = counter();

       for (int i = 0; i<1050000; ++i)

       {            

      

              size_t capacity = vec.capacity();

              size_t size = vec.size();

             

              vec.push_back(obj);

 

              if (capacity == size) // indication that vector reallocation has occurred.

              {

                     ++total_vector_resizes;

                     total_copy_move_count += size;   

              }

       }

       const long long finish = counter();

       total_pushback_time = (finish-start)*1.0 / frequency();

 

       // print result

std::cout << std::endl;

       std::cout << std::fixed;

       std::cout << "Total Pushback time: \t" << total_pushback_time << " sec" << std::endl;

       std::cout << "Total number of vector reallocations: \t" << total_vector_resizes << std::endl;

       std::cout << "Total number of objects copied/moved: \t" << total_copy_move_count << std::endl;

}

 

 

VS2008:

c:\TEMP\STLPerf>cl /EHsc /nologo /W4 /O2 /GL /D_SECURE_SCL=0 rvalue.cpp

rvalue.cpp

Generating code

Finished generating code

 

c:\TEMP\STLPerf> rvalue.exe

 

Total Pushback time:    8.309241 sec

Total number of vector reallocations:   36

Total number of objects copied/moved:   3149622

 

VS2010:

c:\TEMP\STLPerf>cl /EHsc /nologo /W4 /O2 /GL /D_SECURE_SCL=0 rvalue.cpp

rvalue.cpp

Generating code

Finished generating code

 

c:\TEMP\STLPerf> rvalue.exe

 

Total Pushback time:    2.335225 sec
Total number of vector reallocations:   36
Total number of objects copied/moved:   3149622

As you can see that in this specific scenario we are getting more than 3 times performance boost. This gain is due to the fact that the move machinery is used during vector reallocations, which is cheaper as compared to the copy machinery (copy construction and copy assignment). And the bigger your object is and the larger your vector is, the more performance gain you will get. So those of you who use vectors on a big scale, this is something you will surely love.

Vector Reallocation (STL Types)

Now the interesting part is that similar to the sample code above, we have implemented move semantics (move constructors and move assignment operators) for our STL types in VS2010. It is worth mentioning here that for some types, in order to improve performance especially in situations where the copy was taking a lot of time such as vector reallocation, we used a trick called “swaptimization” before VS2010. (see pt.#16 in this blog post) Now with the arrival of rvalue references we don’t need that anymore and we have replaced those swaptimization tricks with the rvalue reference machinery and, guess what, we are now even faster. The following example shows one such scenario where we used swaptimization earlier.

#include <iostream>

#include <ostream>

#include <ios>

#include <stddef.h>

#include <string>

#include <vector>

#include <windows.h>

 

 

//***************************** START: <Helper Code> ***********************************

 

long long counter() {

    LARGE_INTEGER li;

    QueryPerformanceCounter(&li);

    return li.QuadPart;

}

 

long long frequency() {

    LARGE_INTEGER li;

    QueryPerformanceFrequency(&li);

    return li.QuadPart;

}

 

// to confine the test to run on a single processor in order to get consistent results for all tests.

void perf_startup() {

    SetThreadAffinityMask(GetCurrentThread(), 1);

    SetThreadIdealProcessor(GetCurrentThread(), 0);

    Sleep(1);

}

 

//***************************** END: <Helper Code> ***********************************

 

int main()

{

       perf_startup();

 

       long long total_copy_move_count = 0;

long long total_vector_resizes = 0;

double total_pushback_time = 0.0;

 

       std::vector<std::string> v_str;

 

       long long start = counter();

       for (int i = 0; i<12000000; ++i)

       {

              size_t size = v_str.size();

              size_t capacity = v_str.capacity();

              v_str.push_back("I think, therefore I am");

             

              if (size == capacity) // indication that vector reallocation has occurred.

              {

                     ++total_vector_resizes;

                     total_copy_move_count += size;          

              }     

       }     

       long long finish = counter();

       total_pushback_time = ((finish - start)*1.0 / frequency());

      

       std::cout << std::fixed;

       std::cout << std::endl;

       std::cout << "Total Pushback time: \t" << total_pushback_time << " sec" << std::endl;

       std::cout << "Total number of vector reallocations: \t" << total_vector_resizes << std::endl;

       std::cout << "Total number of objects copied/moved: \t" << total_copy_move_count << std::endl;

 

}

 

 

VS2008:

c:\TEMP\STLPerf>cl /EHsc /nologo /W4 /O2 /GL /D_SECURE_SCL=0 swaptimization.cpp

swaptimization.cpp

Generating code

Finished generating code

 

c:\TEMP\STLPerf> swaptimization.exe

 

Total Pushback time:    6.717094 sec

Total number of vector reallocations:   42

Total number of objects copied/moved:   35875989

 

VS2010:

c:\TEMP\STLPerf>cl /EHsc /nologo /W4 /O2 /GL /D_SECURE_SCL=0 swaptimization.cpp

swaptimization.cpp

Generating code

Finished generating code

 

c:\TEMP\STLPerf> swaptimization.exe

 

Total Pushback time:    3.780286 sec

Total number of vector reallocations:   42

Total number of objects copied/moved:   35875989

 

You can see that moving to rvalue references hasn’t impacted the performance. In fact, in this particular example, we are almost twice as fast as VS2008. The credit for this gain also goes to rvalue references, not specifically during vector reallocation, but during push_back() instead. As you can see that we are passing a literal string (an rvalue- see footnote) to the function. So the move machinery comes into play here again, giving us another performance boost.   

This is all what I have to share in my blog. If you have any questions about performance of STL or in general about STL, please feel free to write to me at: Mohammad.Usman at microsoft.com.

Thanks!

 

-Usman

 

* String literals are actually lvalues (C++03 5.1/2), whereas all other literals are rvalues.  However, since we’ve got a vector<string>, push_back() takes const string&.  This constructs a temporary std::string, which is an rvalue.  That’s where move semantics comes into play.

 

  • > I think there are/were exceptions, e.g. when the memory overhead of a reference count per object is significant and the vector is a private member of a class which accesses it via methods that take care of the new/delete (inc. the class's destructor, of course).

    There's more to it. Keep in mind that your class constructor might add some items to that vector and then throw, in which case your class destructor won't even run.

  • The memory overrun when I ran the last codes. sigh!

  • Please blog about improvements in speed for machine code.  These basic improvements, like generating much better code for multi-core CPUs would greatly help us as we do lots of complex multi-nested loop array processing (i.e., image processing).  

    Data structure handling improvements ghelp slightly but are less than 5% of our application's run time.

  • The current VS2008 provides compiler generated copy constructors and assignment operators if they're not written.

    Will there be support for compiler generated move constructors and move assignment operators?

    I've occasionally had problems in my own code where I'd forgotten one member in a hand-written copy constructor, resulting in unpleasant effects.

  • [Mick]

    > Will there be support for compiler generated move

    > constructors and move assignment operators?

    No, because the current C++0x Working Paper says that move constructors and move assignment operators are never implicitly defined by the compiler.

    The Standardization Committee is currently talking about such a mechanism, but we can't implement anything until it's set in stone.

  • I still get confused, are compilers going to generate move constructors like copy constructors? I hope so otherwise I'd not use them for the same reason I don't manually write copy constructors.

  • hello,guys!

    I'm a newer!

    I try those code under VS.2003,winXP, it was complied successly, and runs out of memory in my first pc(celeron 2.26 512RAM),stopped at _heap_alloc_bdg() of dbfheap.c.

    and I run it on my another pc(p4,2.8,1G RAM),it runed 1276.593236 sec.

    Then I try to complie it under minGW 4.3. After changed the windows.h to time.h,it works fine,and just run 1.547000 sec. in my first pc(celeron), and 0.947000sec. in my P4.

    I complied it with time.h under Vs.2003 again, the result is same: celeron is out of memory,

    P4 is 4896.203000 sec.

    So I want to know which complier is better?

    is it really to run pushback() in minGw? and what kind pc does Usman use.

    hope someone test it and help me!

  • Move constructors and move assignment operators will not be automatically generated by the VS 2010 compiler, in accordance with the current C++0x Working Paper (N2914).  This avoids changing the semantics of existing code and potentially breaking it. The C++ Standardization Committee is currently considering a proposal by Bjarne Stroustrup to automatically generate move constructors and move assignment operators without breaking existing code; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2904.pdf . The new compiler supports rvalue reference and move machinery.  We write our move constructors to get advantage of that machinery. Now the interesting part is that in VS 2010 STL, we provide move constructors and move assignment operators for all STL classes whenever possible. So when you use containers of STL objects, you don’t have to do anything extra to take advantage of the move machinery. And you can also benefit from this performance gain even in your own classes. But, yes, in that case you will have to write move constructors and assignment operators yourself. And given the amount of performance gain in large containers of big user defined types (which most of the times is the case with enterprise products) it’s definitely worth it :)

  • Due to an internal error the below two posts were deleted. Sorry for any confusion.

    Thank you,

    Visual C++ Team

    Obviously that increases the risk of some code forgetting to delete ...    That's why in this case, without C++0x, boost::ptr_vector (and friends) is better than boost::vector<boost::shared_ptr<T>>. It gives the same behavior as if it were possible to have an STL container of auto_ptrs (i.e. the container has strict ownership semantics and no per-object refcounting overhead), with the additional constraint that none of its elements can be NULL. Will Microsoft's new STL have something similar?

    POSTED BY: Michel Boto

    Just got back from the C++ committee meetting in Frankfurt and the idea of generating move constructors was indeed a hot topic (complete with fiery animations!)    It looks likely that C++0x will generate move constructors and move assignment operators automatically 'when it is safe' which essentially means when the compiler can guarantee the generated constructor would never throw an exception.  If there is any chance of throwing, you have to define the constructor yourself, and probably provide a hint in the move constructor signature that this is the case.    Discussion of the syntax was a classic bicycle-shed problem where no clear favourites emerge, but we are trying to find the version that is 'least disliked'.  The original proposal uses new attribute syntax, other ideas involve abusing operators, abusing existing keywords and inventing new keywords.  The problem with new keywords is that just about any variant of 'move' is already in wide use somewhere in the games industry!    Expect to hear more about this after the next meeing in October, hopefully we will agree a final solution then.

    POSTED BY:  AlisdairM

Page 2 of 2 (24 items) 12