TR1 Fixes In VC9 SP1

TR1 Fixes In VC9 SP1

Rate This
  • Comments 54

STL enjoys speaking in the third person and also enjoys bringing you this exclusive news:

 

Visual Studio 2008 Service Pack 1 (VC9 SP1) contains the TR1 and MFC library extensions that were originally released in the Visual C++ 2008 Feature Pack Refresh.  But wait!  VC9 SP1 also contains many delicious fixes for TR1 and MFC.  (For TR1, "many" is 16; for MFC, "many" is 60 or more!)

 

As the current maintainer of the C++ Standard Library and TR1 implementations within Microsoft, STL has compiled an exhaustive list of the TR1 fixes in VC9 SP1.  But first!  Shout-outs must be sent to P.J. Plauger and Christopher J. Walker of Dinkumware, who tirelessly worked to implement most of these fixes, and Microsoft's compiler front-end ninja Jonathan Caves, who fixed nasty compiler bugs that were exposed by TR1.

 

 

The three most significant fixes are:

 

* A massive performance improvement in regex matching.

 

* An across-the-board performance improvement in STL containers of TR1 objects (e.g. vector<shared_ptr<T> >).

 

* A fix allowing tr1::function to store function objects with non-const function call operators.

 

 

This is the exhaustive list:

 

1. The STL algorithm search_n() no longer spuriously triggers _HAS_ITERATOR_DEBUGGING assertions.  (search_n() isn't part of TR1, but this is why it's being mentioned here:  A severe search_n() bug, a regression from VC8 SP1 to VC9 RTM where the predicate version attempted to use operator==(), was fixed in the VC9 Feature Pack Refresh.  However, the fix contained this less severe flaw, which was noticed in time to be truly fixed in VC9 SP1.)

 

2. The random distribution uniform_int<unsigned long long> no longer infinitely loops.

 

3. The <random> header has been overhauled, incorporating many C++0x improvements.

 

4. The copy constructors of the pseudorandom number generators now behave correctly.  (This was a subtle mistake where a template constructor provided a better match than a copy constructor during overload resolution.)

 

5. enable_shared_from_this's copy constructor and copy assignment operator have been corrected.  (This affected classes deriving from enable_shared_from_this, and did not affect other uses of shared_ptr.)

 

6. shared_ptr<const T> can now be constructed from const T *.  (This is relatively unusual.)

 

7. tr1::function can now store function objects with non-const function call operators.  (This was a severe problem.)

 

8. The performance of regex matching has been massively improved.  (In general, TR1 Regex is as fast or faster than Boost.Regex 1.35.0.  TR1 Regex is still slower than Boost for some regexes, such as those dominated by alternations like "cute|fluffy|kittens", but their performance has also improved significantly compared to the Feature Pack Refresh.  Further performance improvements are being investigated for VC10.)

 

9. The performance of unordered_set (etc.) and hash_set (etc.) has been significantly improved.  (tr1::unordered_set and stdext::hash_set share the same implementation.  erase() still presents performance problems, which are being investigated for VC10.)

 

10. result_of now accepts references to functions.

 

11. is_function and is_member_function_pointer now work correctly with variadic arguments.

 

12. is_polymorphic now works correctly.  (It previously gave bogus answers for classes like std::iostream.  This was a compiler bug fixed by Jonathan Caves.)

 

13. The <memory> header's declarations of the _InterlockedIncrement (etc.) intrinsics can now be suppressed by defining _DO_NOT_DECLARE_INTERLOCKED_INTRINSICS_IN_MEMORY .  This is for compatibility with <intrin.h> and <winbase.h>, which contain conflicting declarations for certain configurations of certain platforms.  (This workaround has been eliminated in VC10, which contains a comprehensive fix.)

 

14. mem_fn() now works with abstract base classes.  (This surprising bug was caused by library and compiler bugs, both fixed.  Other code may benefit from this compiler fix.)

 

15. is_pod and has_trivial_constructor now work correctly.  (They previously gave bogus answers for certain uncommon classes.)

 

16. The Swaptimization (previously mentioned in this VCBlog post) is now actually used for TR1 objects.

 

 

#16 deserves some explanation.

 

STL containers (vector, deque, list, set, multiset, map, multimap) are "fast-swappable"; x.swap(y) and swap(x, y) are constant-time, nofail, non-iterator-invalidating.  (You might remember that VC8's Swap Bug broke that last, very important part.)  std::string swap() is also constant-time and nofail, although it can invalidate iterators due to the small string optimization.

 

Constant-time means that swapping STL containers is implemented by swapping their guts (their pointers, whether to a vector's block of memory, a list's doubly-linked nodes, or a map's binary tree nodes).  That's far superior to all of the element-copying that "Container temp = a; a = b; b = temp;" would involve.

 

Now, when a vector undergoes reallocation, it has to copy its current elements from the old memory block to the new memory block.  When you have a vector of STL containers (vector<vector<int> >, vector<set<int> >, etc.), that would be slow, copying the sub-containers and doing lots of dynamic memory allocation and deallocation.

 

The Swaptimization, which was present in VC8 (possibly earlier, but STL hasn't checked), is a bit of template magic whereby STL containers are marked as having fast swaps.  When a vector<T> undergoes reallocation, it detects whether the T has a fast swap, and if so, will swap elements from the old memory block to the new memory block.  This is super fast!  (Why doesn't it always swap?  For vector<int>, simply copying ints from the old memory block to the new memory block is faster - nothing needs to be swapped back into the old memory block.)

 

Many TR1 objects, such as unordered_set and match_results and shared_ptr, also have fast swaps.  This is because they're either containers (like unordered_set) or they wrap containers (like match_results), or they're not containers but they have the same pointers-to-stuff structure (like shared_ptr).  So, Dinkumware and STL worked really hard to give all TR1 objects fast swaps, and mark them appropriately to be picked up by the Swaptimization.

 

Unfortunately, this was simply broken in the Feature Pack Refresh!  The problem was subtle.  The STL provides the swap() free function in namespace std, which performs the three-step dance that works for anything copyable and assignable (but might be slow).  The idea is that users with fast-swappable classes, in addition to providing member swap(), can fully specialize swap() in namespace std for their classes.  (Generally, users *aren't* supposed to add anything to namespace std.  However, Standard templates may be fully or partially specialized for user classes; this is paragraph 17.4.3.1/1 of the Standard.)  This works well enough.

 

However, users with fast-swappable class *templates* have to do something different.  A not-commonly-understood fact about C++ is that there are no such things as partial specializations of function templates.  Anything that looks like a partial specialization is actually an overload.  (Only classes can be partially specialized.)  In practice, overloads behave similarly enough to how users think partial specializations of function templates would work, and everyone gets along happily.

 

But this means that you can't add overloads of swap() to namespace std, even if you think they look like those mythical partial specializations.  So, if you have a fast-swappable class template, you need to provide a swap() free function in the same namespace as your class template, to be found through Argument-Dependent Lookup (ADL, formerly and less descriptively called Koenig Lookup).

 

(No, this isn't really ideal.  Swapping is such a fundamental operation that it really ought to be recognized in the Core Language like copying - but it's far too late to change that.)

 

So, TR1, which is almost but not quite part of the Standard, came along and provided a bunch of fast-swappable stuff in namespace std::tr1, and defined overloads of swap(), again within std::tr1.  The TR1 classes were fast-swappable, annotated as such, and detected as such by the existing Swaptimization machinery within vector and a few other places.  So why didn't it work?

 

It turned out that the Swaptimization was being performed with a call to qualified std::swap().  Most people don't do C++ name lookup in their heads for fun, but the important thing to know is this: Qualified Name Lookup disables ADL.  Uh oh!  So, the general implementation of std::swap() (doing the slow three-step dance of copying) was chosen, instead of the specific implementations of std::tr1::swap() for shared_ptr<T>, unordered_set<T>, and so forth.  Disaster.  STL was deeply mortified.

 

(Why was a qualified call being used?  Another name lookup subtlety - unqualified calls like swap() activate both Unqualified Name Lookup (the usual, which everyone is familiar with) and ADL (which everyone really should be familiar with).  Ordinarily, the union of the sets of functions they find is then used for overload resolution (picking what actually gets called).  But if Unqualified Name Lookup finds a member function, ADL is bypassed; this is paragraph 3.4.2/2a of the Standard.  So, within a class that defines a member swap(), calling unqualified swap() doesn't activate ADL, nor does it even find std::swap() - it finds the member swap().  Thus, qualified calls became conventional within VC's Standard Library implementation.)

 

The fix was to define a wrapper, std::_Swap_adl(), that calls unqualified swap(), activating ADL properly.  The STL now calls std::_Swap_adl() whenever ADL is desired.  (It continues to call std::swap() whenever ADL is unnecessary, such as when swapping builtins).  As with all _Leading_underscore_capital names, users shouldn't call std::_Swap_adl() themselves (it may change or disappear in future versions).  You can perform the exact same trick by defining your own wrapper in your own namespace of choice, with the wrapper containing a "using namespace std;" and an unqualified call to swap().  (If you look at std::_Swap_adl()'s implementation, it lacks a using-directive - it already lives in namespace std, unlike anything you can define.)

 

The end result is that when a vector<shared_ptr<T> > undergoes reallocation, absolutely no reference counts are incremented or decremented - which is a significant performance win.  Woot!

 

Note that none of these fixes were in the SP1 Beta, which branched for release as the Feature Pack was being finished.

 

Stephan T. Lavavej

Visual C++ Libraries Developer

 

  • PingBack from http://www.easycoded.com/tr1-fixes-in-vc9-sp1

  • Out of curiousity, was any performance testing of tr1::regex done comparing to boost::xpressive rather than boost::regex? Given the introduction of boost::xpressive, boost::regex strikes me as a legacy library, only useful for a mostly-TR1-compatible API.

  • Remember that <regex> is part of the C++0x Standard Library, so I'd hardly call it "legacy".

    I wasn't familiar with Boost.Xpressive, although from glancing at its documentation it appears to be extremely powerful. I'll look at it for future performance comparisons, thanks.

  • STL/CLR is embarrassingly slow compared to BCL. Has this been addressed?

  • All good stuff. I wish the trait framework used to mark classes for swap optimization was public and documented, though, so we could use it for our own (unportable, obviously - though nothing an #ifdef can't alleviate - but sometimes performance is more important).

  • SP1's unordered_set is horribly slow compare to boost::unordered_set or even std::set.

    and worse, it's destructer is unbelievably slow.

    but if i called clear() before destructer is called, it is not so slow.

  • Does anyone have a sense of how TR1 regex compares to the exiting ATL regex implementation (CAtlRegExp)? From a performance and or functional perspective?

  • My latest in a series of the weekly, or more often, summary of interesting links I come across related to Visual Studio. Yesterday, Visual Studio 2008 SP1 and .Net 3.5 SP1 were released. Below is a list of links related to those releases: Greg Duncan

  • Hi,

    I've run the Boost regression tests with SP1 final and a lot of the TR1 failures i was seeing before have been fixed, but there are still a couple left.

    For example, there is a problem with a missing result_type definition in Bind (http://tinyurl.com/5myz9f). You previously mentioned (http://blogs.msdn.com/vcblog/archive/2008/04/07/visual-c-2008-feature-pack-released.aspx#8405685) that this was a known bug, so did it not get fixed in SP1?

    Thanks,

    Richard Webb

  • Re: boost::regex -- by "legacy" I mean, if you're using Boost, then presumably you want something better than what's in the standard; because <regex> is standard in C++0x/TR1, I'd call boost::regex legacy since it doesn't offer much beyond what the standard already has. boost::xpressive, on the other hand, does have quite an advantage over <regex> in terms of design and performance, especially in regards to static expressions.

  • Your post lists the 16 fixes for TR1, but what about the 60 fixes for MFC?  Is there a post or article somewhere about these?

  • I went to the official page for Microsoft Visual Studio 2008 Service Pack 1 (exe), which is: http://www.microsoft.com/downloads/details.aspx?FamilyId=FBEE1648-7106-44A7-9649-6D9F6D58056E&displaylang=en

    At the bottom, within section [Related Resources], I clicked on:

    2. Fixes included in this Service Packs (KB945140)

    This gave me [Visual Studio 2008 Service Pack 1 Beta release notes and a list of fixed issues], which was published April 29, 2008.

    Hopefully the actual fix list is posted sometime.  Info on the MFC fixes would be great too.

  • Hi all,

    I've just posted a summary of the bugs that we fixed in MFC for VS2008 SP1.  See it here: http://blogs.msdn.com/vcblog/archive/2008/08/12/bugs-fixed-in-mfc-in-visual-studio-2008-sp1.aspx

    Pat Brenner

    Visual C++ Libraries Development

  • [int19h]

    > I wish the trait framework used to mark classes for swap optimization

    > was public and documented, though, so we could use it for our own

    > (unportable, obviously - though nothing an #ifdef can't alleviate - but

    > sometimes performance is more important).

    This is probably as documented as it'll get.  See below for my utterly unsupported example.

    [hito]

    > SP1's unordered_set is horribly slow compare to boost::unordered_set or even std::set.

    > and worse, it's destructer is unbelievably slow.

    > but if i called clear() before destructer is called, it is not so slow.

    Please file bugs with specific test cases through Microsoft Connect. We've identified one major performance problem in tr1::unordered_set; erase() doesn't achieve its required complexity.

    (I thought Boost 1.35.0 hadn't implemented the unordered containers yet; see http://www.boost.org/doc/libs/1_35_0/doc/html/boost_tr1.html .)

    [MR]

    > Does anyone have a sense of how TR1 regex compares to the exiting ATL regex implementation (CAtlRegExp)?

    > From a performance and or functional perspective?

    CAtlRegExp, along with the rest of ATL Server, was removed from VC9.

    Hopefully, TR1 Regex outperforms CAtlRegExp, but we haven't done any performance comparisons there.  From a usability perspective, TR1 Regex wins hands down.

    [Richard Webb]

    > I've run the Boost regression tests with SP1 final and a lot of the TR1

    > failures i was seeing before have been fixed, but there are still a couple left.

    > For example, there is a problem with a missing result_type definition in Bind

    > (http://tinyurl.com/5myz9f). You previously mentioned

    > (http://blogs.msdn.com/vcblog/archive/2008/04/07/visual-c-2008-feature-pack-released.aspx#8405685)

    > that this was a known bug, so did it not get fixed in SP1?

    Correct; this was postponed to VC10.

    And here's that example for int19h:

    C:\Temp>type meow.cpp

    #include <iostream>

    #include <ostream>

    #include <vector>

    using namespace std;

    namespace feline {

       struct kitty {

           kitty() { cout << "default ctor" << endl; }

           kitty(const kitty&) { cout << "copy ctor" << endl; }

           kitty& operator=(const kitty&) { cout << "assign" << endl; return *this; }

           ~kitty() { cout << "dtor" << endl; }

           void swap(kitty&) { cout << "swap" << endl; }

       };

       void swap(kitty& a, kitty& b) { a.swap(b); }

    }

    // Specific to VC9 SP1 and above.

    // This machinery was broken for classes outside namespace std in VC9 RTM and below.

    // This machinery will not be present in VC10 and above.

    #if defined(_MSC_VER) && _MSC_VER == 1500 && _MSC_FULL_VER >= 150030729

       #include <xutility>

       namespace std {

           template <> class _Move_operation_category<feline::kitty> {

           public:

               typedef _Swap_move_tag _Move_cat;

           };

       }

    #endif

    int main() {

       vector<feline::kitty> v;

       cout << "*** Constructing cat." << endl;

       feline::kitty cat;

       cout << "*** Pushing back cat #1." << endl;

       v.push_back(cat);

       cout << "*** Pushing back cat #2." << endl;

       v.push_back(cat);

       cout << "*** Destroying cat and v." << endl;

    }

    C:\Temp>cl /EHsc /nologo /W4 meow.cpp

    meow.cpp

    C:\Temp>meow

    *** Constructing cat.

    default ctor

    *** Pushing back cat #1.

    copy ctor

    default ctor

    dtor

    default ctor

    dtor

    *** Pushing back cat #2.

    copy ctor

    default ctor

    copy ctor

    swap

    dtor

    default ctor

    dtor

    dtor

    *** Destroying cat and v.

    dtor

    dtor

    dtor

    Note that the Swaptimization requires a default ctor.

  • Thank you.

    Hopefully, VC10 will see C++0x finalized and implemented, with its move constructors and all, so we won't need this hack by that time.

Page 1 of 4 (54 items) 1234