Van's House

I'm a developer at C++ Shanghai team. I'm interested in everything related to C++

  • Inline or Not Inline

    What is inline? This keyword is mainly used to ask the compiler to inline substitution of the function body at the point of call. Like 'register', this is only a suggestion to the compiler. Modern compiler can handle inlining using advanced heuristic to get better size / performance balance without the suggestion from the developer. So most of the time, it is unnecessary to use this keyword.

    Then what else? Here is the whole story of inline.

    Besides the suggestion to the compiler, inline keyword will also have some side effects.

    Assuming I have defined a function 'f' in header file and include the header in two different files: a.cpp and b.cpp. Like:

    void f() {}

    You'll get the following error when compiling (cl a.cpp b.cpp):

    b.obj : error LNK2005: "void __cdecl f(void)" (?f@@YAXXZ) already defined in a.obj
    a.exe : fatal error LNK1169: one or more multiply defined symbols found

    However, if you change the definition of f to:

    inline void f() {}

    The compilation will succeed.

    Here are the words from the standard:

    Every program shall contain exactly one definition of every non-inline function or object that is used in that program. An inline function shall be defined in every translation unit in which it is used.

    This is the fundamental difference between using or not using inline. That means inline functions can be defined multiple times in different translation unit (assuming they have exactly the same definition in every case), but it will also cause the so-called "inline-explosion" because the definition has to be duplicated.

    A function defined within a class definition is an inline function. If you don't want to duplicate the definition everywhere, it is better to move the definition into a separate cpp. /LTCG can still inline it if appropriate. 

  • Visual C++ Shanghai Team Blog is Online

    I’m glad that our team blog is now online (http://blogs.msdn.com/vcshblog/). It is in Chinese and is targeted to Chinese developers.

    We will write technical articles related to our work in Shanghai and also translate articles on VCBlog into Chinese.

    Here are articles posted by our team on VCBlog:

  • Output Text in Unicode

    According to C standard, it only supports output text in MBCS (Multi-Byte Character String):

    n1124.pdf, 7.19.3/12

    The wide character output functions convert wide characters to multibyte characters and write them to the stream as if they were written by successive calls to the fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with the conversion state described by the stream’s own mbstate_t object. The byte output functions write characters to the stream as if by successive calls to the fputc function.

    However, MBCS doesn’t support mixing characters in different code page. For example, you can’t use both Chinese and Japanese.

    VC provides extension to allow you to output text in Unicode:

    #include <cstdio>
    #include <locale.h>

    #include <io.h>
    #include <fcntl.h>

    void OutputMBCS(FILE *f)
    {
        // ".936" is the code page for Simplified Chinese
        // However, you can't use ".1200" (code page for Unicode) to output text in Unicode
        setlocale(LC_CTYPE, ".936");
        // My name in Chinese: "范翔"
        fwprintf(f, L"%s", L"\x8303\x7FD4");

        // The text in the file is encoded in GBK
    }

    void OutputUnicode(FILE *f)
    {
        _setmode(_fileno(f), _O_U16TEXT);
        fwprintf(f, L"%s", L"\x8303\x7FD4");

        // The text in the file is encoded in Unicode
    }

    For more information, please check the following post: http://blogs.msdn.com/michkap/archive/2008/03/18/8306597.aspx

  • IDA Pro 5.5 is released

    IDA Pro 5.5 ships 12th of June 2009.

    The change list is here:

    http://www.hex-rays.com/idapro/55/index.htm

  • Detect Shift Overflow

    This is an intellectual exercise: when shifts a 32-bit unsigned integer in C++, how to detect whether the calculation overflows efficiently?

    Here is the function prototype. shl_overflow will return true if v << cl overflows (cl is between 0 and 31. And we assume that sizeof(unsigned long) == 4 and sizeof(unsigned long long) == 8).

    bool shl_overflow(unsigned long v, int cl)

    The most natural way to implement this function is to extend v to 64-bit integer:

    bool shl_overflow(unsigned long v, int cl)
    {
        unsigned long long vl = v;
        return (vl << cl >> 32) != 0;
    }

    Now, let’s dig into the assembly world. We’ll limit the discussion on x86.

    mov     eax, DWORD PTR _v$[esp-4]
    mov     ecx, DWORD PTR _cl$[esp-4]
    xor     edx, edx
    call    __allshl
    xor     eax, eax
    or      eax, edx
    jne     overflow

    The implementation has to use three specific registers: eax, edx and ecx. And there is an expensive external function call.

    If you step into __allshl in the debugger, you can find that it will use shld to shift 64-bit integer. VC provides some intrinsics which map to CPU instructions. For example, __ll_lshift will map to shld.

    Because the high dword of vl is 0, we can simplify our code:

    bool shl_overflow(unsigned long v, int cl)
    {
        unsigned long long vl = __ll_lshift(v, cl);
        return (static_cast<unsigned long>(vl >> 32)) != 0;
    }

    The assembly looks like:

    mov     eax, DWORD PTR _v$[esp-4]
    mov     ecx, DWORD PTR _cl$[esp-4]
    xor     edx, edx
    shld    edx, eax, cl
    test    edx
    jne     overflow

    Much better now.

    Another approach is based on bit representation.

    bool shl_overflow(unsigned long v, int cl)
    {
        v = _rotl(v, cl);
        unsigned long index;
        return _BitScanForward(&index, v) ? index >= cl : false;
    }

    The idea is simple. If v << cl overflows, that means the most significant cl bits of v should contains "1".

    There are two ways to test that.

    1. Scan v from the least significant bits to the most, and test the index against 32 – cl. However, we have to handle the case when cl = 0.

    2. Rotate v cl bits left first, so the most significant cl bits will be the least significant cl bits. Then we can scan and test the index against cl directly.

    Notice that, the scan may fail if v is 0. The second way is simpler and more efficient.

    The assembly looks like:

    mov     ecx, DWORD PTR _cl$[esp-4]
    mov     eax, DWORD PTR _v$[esp-4]
    rol     eax, cl
    bsf     eax, eax
    je      notoverflow
    cmp     eax, ecx
    jl      overflow

    It only uses two registers. It can also be extended to handle 64-bit shift. One drawback is an extra conditional jump (The extra jump can be replaced by "cmovz eax, ecx", but there is no way to ask the compiler to generate that)

  • Recursive Algorithm in C++

    Many recursive algorithms have initial parameters. For example, Fibonacci Number is defined as: Fn = Fn-1 + Fn-2, with F1 = F2 = 1.

    By giving different values to F1 and F2, we can generate different sequence of numbers.

    1. If we implement the algorithm using functions, we have to either define these parameters as global variables or pass them in each recursive iteration.

    int Fib(int n, int f1, int f2)
    {
        if (n < 1) return 0;

        if (n >= 3) {
            return Fib(n - 1, f1, f2) + Fib(n - 2, f1, f2);
        } else if (n == 2) {
            return f2;
        } else {
            return f1;
        }
    }

    2. Recursive functor can store the initial parameters as the class data member. Here is the example.

    struct FibFunctor
    {
        FibFunctor(int f1, int f2) : m_f1(f1), m_f2(f2) {}
        int m_f1, m_f2;

        int operator()(int n) const {
            if (n < 1) return 0;

            if (n >= 3) {
                return (*this)(n - 1) + (*this)(n - 2);
            } else if (n == 2) {
                return m_f2;
            } else {
                return m_f1;
            }
        }
    };

    int Fib(int n, int f1, int f2)
    {
        FibFunctor f(f1, f2);
        return f(n);
    }

    3. Lambda is a new core language feature introduced in C++0x to define implicit function object. However, "this" is not valid inside lambda expression.

    This post Visual C++ Team Blog - Stupid Lambda Tricks shows how to write a recursive lambda. So the above functor can be rewritten as:

    int Fib(int n, int f1, int f2)
    {
        auto f = [&](int n) -> int {
            if (n < 1) return 0;

            if (n >= 3) {
                return f(n - 1) + f(n - 2);
            } else if (n == 2) {
                return f2;
            } else {
                return f1;
            }
        };
        return f(n);
    }

    Here, [&] is equivalent to [&f, &f1, &f2].

    (According to C++03 3.3.1/1: "The point of declaration for a name is immediately after its complete declarator and before its initializer (if any).", it is legal to capture f in the lambda expression. That means:

    struct A {
        A(A*);
    };

    int main()
    {
        A a = A(&a); // OK
    })

  • Measure Initialization Time of Global Variables

    NOTICE: The technique describes in the article may not be supported in future release of VC. You should not use it in production code

    There are two kinds of initialization in C++: static initialization and dynamic initialization.

    According to the standard, static initialization shall always be performed before any dynamic initialization takes place.

    In VC, static initialization is done at compile time. The value is stored in the data section of the generated executable. On the other hand, dynamic initialization happens at runtime. Before entering main, CRT will call the dynamic initializers of the global variables.

    Sometimes you may need to measure the startup time of your program. However, dynamic initialization happens before main, so your measurement will not include it.

    This article "CRT Initialization" describes the detailed information of how dynamic initialization works in VC.

    VC provides pragma init_seg to fine-control the initialization process. It will place the dynamic initializers in the specific section. Besides predefined compiler, lib and user (the corresponding section names are ".CRT$XCC", ".CRT$XCL" and ".CRT$XCU"), you can also specify the section name explicitly.

    With these knowledge, we can use the following code to measure the initialization time of global variables.

    Let’s create two files: InitTime_Start.cpp and InitTime_End.cpp (we have to use two files because one file can only have one init_seg)

    //InitTime_Start.cpp

    #pragma warning(disable : 4075)

    #pragma init_seg(".CRT$XCB")

    Timer gInitTimer;

    //InitTime_End.cpp

    #pragma warning(disable : 4075)

    #pragma init_seg(".CRT$XCY")

    double gInitTime = gInitTimer.GetTime();

    Here, "Timer" is a class which will start timing in ctor and return the elapsed time in member function GetTime. Because ".CRT$XCB" will be placed before ".CRT$XCC", the dynamic initializer of the timer will be called before any dynamic initializers of compiler generated global variables. Similarly, gInitTimer.GetTime will be called after all the dynamic initializers of user defined global variables. Then "gInitTime" will contain the initialization time of all global variables including those generated by compiler, library and user.

  • Optimize Your Code: Matrix Multiplication

    Matrix multiplication is common and the algorithm is easy to implementation. Here is one example:

    Version 1:

    template<typename T>
    void SeqMatrixMult1(int size, T** m1, T** m2, T** result)
    {
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                result[i][j] = 0;
                for (int k = 0; k < size; k++) {
                    result[i][j] += m1[i][k] * m2[k][j];
                }
            }
        }
    }

    This implementation is straight-forward and you can find it in text book and many online samples.

    Version 2:

    template<typename T>
    void SeqMatrixMult2(int size, T** m1, T** m2, T** result)
    {
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                T c = 0;
                for (int k = 0; k < size; k++) {
                    c += m1[i][k] * m2[k][j];
                }
                result[i][j] = c;
            }
        }
    }

    This version will use a temporary to store the intermediate result. So we can save a lot of unnecessary memory write. Notice that the optimizer can not help here because it doesn't know whether "result" is an alias of "m1" or "m2".

    Version 3:

    template<typename T>
    void Transpose(int size, T** m)
    {
        for (int i = 0; i < size; i++) {
            for (int j = i + 1; j < size; j++) {
                std::swap(m[i][j], m[j][i]);
            }
        }
    }
    template<typename T>
    void SeqMatrixMult3(int size, T** m1, T** m2, T** result)
    {
        Transpose(size, m2);
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                T c = 0;
                for (int k = 0; k < size; k++) {
                    c += m1[i][k] * m2[j][k];
                }
                result[i][j] = c;
            }
        }
        Transpose(size, m2);
    }

    This optimization is tricky. If you profile the function, you'll find a lot of data cache miss. We transpose the matrix so that both m1[i] and m2[i] can be accessed sequentially. This can greatly improve the memory read performance.

    Version 4:

    template<typename T>
    void SeqMatrixMult4(int size, T** m1, T** m2, T** result);
    // assume size % 2 == 0
    // assume m1[i] and m2[i] are 16-byte aligned
    // require SSE3 (haddpd)
    template<>
    void SeqMatrixMult4(int size, double** m1, double** m2, double** result)
    {
        Transpose(size, m2);
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                __m128d c = _mm_setzero_pd();

                for (int k = 0; k < size; k += 2) {
                    c = _mm_add_pd(c, _mm_mul_pd(_mm_load_pd(&m1[i][k]), _mm_load_pd(&m2[j][k])));
                }
                c = _mm_hadd_pd(c, c);
                _mm_store_sd(&result[i][j], c);
            }
        }
        Transpose(size, m2);
    }
    // assume size % 4 == 0
    // assume m1[i] and m2[i] are 16-byte aligned
    // require SSE3 (haddps)
    template<>
    void SeqMatrixMult4(int size, float** m1, float** m2, float** result)
    {
        Transpose(size, m2);
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                __m128 c = _mm_setzero_ps();

                for (int k = 0; k < size; k += 4) {
                    c = _mm_add_ps(c, _mm_mul_ps(_mm_load_ps(&m1[i][k]), _mm_load_ps(&m2[j][k])));
                }
                c = _mm_hadd_ps(c, c);
                c = _mm_hadd_ps(c, c);
                _mm_store_ss(&result[i][j], c);
            }
        }
        Transpose(size, m2);
    }

    For float types, we can use SIMD instruction set to parallel the data processing.

    Parallel version using PPL (Parallel Patterns Library) and lambda in VC2010 CTP:

    template<typename T>
    void ParMatrixMult1(int size, T** m1, T** m2, T** result)
    {
        using namespace Concurrency;
        for (int i = 0; i < size; i++) {
            parallel_for(0, size, 1, [&](int j) {
                result[i][j] = 0;
                for (int k = 0; k < size; k++) {
                    result[i][j] += m1[i][k] * m2[k][j];
                }
            });
        }
    }

    Result

    Here are the test results (what really matters is the relative time between different version):

    Matrix size = 500 (Intel Core 2 Duo T7250, 2 cores, L2 cache 2MB)

      int long long float double
    Version 1 0.931119s 2.945134s 0.774894s 0.984585s
    Version 2 0.571003s 2.310568s 0.724161s 0.929064s
    Version 3 0.239538s 0.823095s 0.570772s 0.241691s
    Version 4 N/A N/A 0.063196s 0.187614s
    Version 1 + PPL 0.847534s 1.683765s 0.589513s 0.994161s
    Version 2 + PPL 0.380174s 1.190713s 0.409321s 0.594859s
    Version 3 + PPL 0.135760s 0.495152s 0.370499s 0.185800s
    Version 4 + PPL N/A N/A 0.041959s 0.157932s

    Matrix size = 500 (Intel Xeon E5430, 4 cores, L2 cache 12MB)

      int long long float double
    Version 1 0.514330s 1.434509s 0.455168s 0.608127s
    Version 2 0.314554s 1.231696s 0.447607s 0.593517s
    Version 3 0.180176s 0.591002s 0.432129s 0.149511s
    Version 4 N/A N/A 0.042900s 0.083286s
    Version 1 + PPL 0.308766s 0.482934s 0.175585s 0.309159s
    Version 2 + PPL 0.105717s 0.325413s 0.124862s 0.164156s
    Version 3 + PPL 0.073418s 0.193824s 0.116971s 0.061268s
    Version 4 + PPL N/A N/A 0.017891s 0.031734s

    From the results, you can find that:

    • Parallelism only helps if you carefully tune your code to maximize its effect (Version 1)
    • Eliminating unnecessary memory write (Version 2) helps the parallelism
    • Data cache miss can be a big issue when there are lots of memory access (Version 3)
    • Using SIMD instead of FPU on aligned data is beneficial (Version 4)
    • Different data types, data sizes and host architectures may have different kinds of bottlenecks
  • C++ Template Trick: Detecting Object Slicing

    Object slicing often happens when you pass the object by value. Compiler will do implicitly conversion from derived to base for you without any warning message.

    If you want to detect object slicing, you're on your own. However, template can help you.

    Because object slicing will call copy constructor of base class, what you can do is to "hijack" it. The magic looks like:

    #include <type_traits>
    #include "boost\static_assert.hpp"

    template<bool>
    struct SliceHelper
    {
    };
    template<>
    struct SliceHelper<false>
    {
        typedef void type;
    };
    #define DETECTSLICE(NAME,SIZE)\
        enum {_SizeOfClass=SIZE};\
        void _SizeValidation() {BOOST_STATIC_ASSERT(sizeof(NAME)==_SizeOfClass);}\
        template <typename T> NAME(const T &,typename SliceHelper<sizeof(T)==_SizeOfClass || !std::tr1::is_base_of<NAME,T>::value>::type * = 0)\
        {\
            typedef typename T::sliced type;\
        }
    struct A
    {
        int a;
        A() {}
        DETECTSLICE(A,4)
    };
    struct B:public A
    {
    };
    struct C:public A
    {
        int b;
    };
    struct D
    {
        int a;
    };
    struct E
    {
        int a,b;
    };
    void f(A) {}
    int main()
    {
        B b;
        C c;
        D d;
        E e;
        A a;
        A a0(a);
        A a1(b);
        //A a2(c); //error
        //A a3(d); //error
        //A a4(e); //error
        f(a);
        f(b);
        //f(c); //error
        //f(d); //error
        //f(e); //error
    }

    Notice:

    1. The template constructor is not a copy constructor. According to the standard, copy constructor should be non-template. But the template constructor can still be chosen to copy construct the object :-)

    2. You can not use sizeof in the member function declaration because the class is incomplete at that point. You can only use sizeof inside the member function definition. That is why we have to specify the size of the class explicitly.

  • C++ Template Trick: Detecting the Existence of Class Member at Compile Time

    C++0x will provide a full set of type traits helpers to ease generic programming. However, there is no support for the detection of class members. The general problem is hard. Here we will try to tackle the more specific version: detecting the class member with given name and type.

    In C++, function overload is one of the most widely used technique to implement type traits. However, function overload only cares about types. Default argument and access modifier are only considered after the overload resolution. What we want here is to find out whether the specific member exists. So we have to turn the member into the type. Fortunately, template supports non-type argument. And here is the magic:

    namespace van {
        namespace type_traits {
            namespace detail {
                typedef char Small;
                struct Big {char dummy[2];};

                template<typename Type,Type Ptr>
                struct MemberHelperClass;

                template<typename T,typename Type>
                Small MemberHelper_f(MemberHelperClass<Type,&T::f> *);
                template<typename T,typename Type>
                Big MemberHelper_f(...);
            }

            template<typename T,typename Type>
            struct has_member_f
            {
                enum {value=sizeof(detail::MemberHelper_f<T,Type>(0))==sizeof(detail::Small)};
            };
        }
    }

    struct A
    {
        static void f();
    };
    struct B
    {
    };

    #include <iostream>
    using namespace std;

    int main()
    {
        cout<<boolalpha;
        cout<<van::type_traits::has_member_f<A,void (*)()>::value<<endl;
        cout<<van::type_traits::has_member_f<B,void (*)()>::value<<endl;
    }

    If the member "f" is missing, the non-type (&T::f) to type (MemberHelperClass) conversion will be invalid, so the va-arg version will be chosen. Otherwise, the former will be chosen because the va-arg version is always the least preferable. Then has_member_f will distinguish these two cases by checking the size of the return value of the chosen MemberHelper_f function. The above code supports both static members and non-static members. It also supports both data members and function members. However, it has one drawback. If the detected member is non-public, there will be compiler error. That is because access control is applied after the overload resolution.

    Because the member name itself cannot be used as a template argument, we have to use it explicitly in our helper. To prevent the redundant work, we can take advantage of macro to get a more general version:

    #define DEFINEHASMEMBER(Name)\
    namespace van {\
        namespace type_traits {\
            namespace detail {\
                template<typename T,typename Type>\
                Small MemberHelper_##Name(MemberHelperClass<Type,&T::Name> *);\
                template<typename T,typename Type>\
                Big MemberHelper_##Name(...);\
            }\
    \
            template<typename T,typename Type>\
            struct has_member_##Name\
            {\
                enum {value=sizeof(detail::MemberHelper_##Name<T,Type>(0))==sizeof(detail::Small)};\
            };\
        }\
    }

    One usage of this type trait is to simplify dispatcher. For example, we want to provide different implementation for different architecture to get better performance. Instead of dispatch the code manually, we can automate the work using the member detection trait.

    First, we group the different implementations into one helper class.

    struct MemoryCopyHelper
    {
        typedef void (*FunctionType)(const void *lpDest, void *lpSrc, size_t n);
        static void Default(const void *lpDest, void *lpSrc, size_t n){}
        static void MMX(const void *lpDest, void *lpSrc, size_t n){}
    };

    Second, we create the array to store the address of each implementation. If the implementation for some architecture is missing, we can use the default one instead (assume the default one is always present).

    DEFINEHASMEMBER(Default)
    DEFINEHASMEMBER(MMX)
    DEFINEHASMEMBER(SSE2)

    #define DEFINESELECTSTATICMEMBER(MemberName)\
        template<typename T,typename FunType,bool = van::type_traits::has_member_##MemberName<T,FunType>::value>\
        struct select_member_##MemberName;\
        template<typename T,typename FunType>\
        struct select_member_##MemberName<T,FunType,true> {static const FunType value;};\
        template<typename T,typename FunType>\
        struct select_member_##MemberName<T,FunType,false> {static const FunType value;};\
        template<typename T,typename FunType>\
        const FunType select_member_##MemberName<T,FunType,true>::value=&T::MemberName;\
        template<typename T,typename FunType>\
        const FunType select_member_##MemberName<T,FunType,false>::value=&T::Default;

    DEFINESELECTSTATICMEMBER(Default)
    DEFINESELECTSTATICMEMBER(MMX)
    DEFINESELECTSTATICMEMBER(SSE2)

    MemoryCopyHelper::FunctionType gDispatchArray_MemoryCopy[]={
        select_member_Default<MemoryCopyHelper, MemoryCopyHelper::FunctionType>::value,
        select_member_MMX<MemoryCopyHelper, MemoryCopyHelper::FunctionType>::value,
        select_member_SSE2<MemoryCopyHelper, MemoryCopyHelper::FunctionType>::value,
    };

    Then you can focus on the implementation. You can update the helper class to add the optimized version for the missing architecture in the future. The array will be automatically updated. (Notice: the above array will be initialized dynamically before entering into main)

    BTW: The above code may fail on some old compilers. It is OK with VC8, VC9, gcc 3.4.5.

  • Standard library changes in C++0x

    C++0x will be released in the near future. Do you know the changes of standard library? Here is a list of changes that I've collected (minor behavior changes and changes related to concepts are not included).

    1. New stuff

    system_error:
        new header
    array, vector, deque, list, string, map, set, unordered_map, unordered_set:
        new member function: cbegin, cend, crbegin, crend
    vector, deque, string:
        new member function: shrink_to_fit
    map, unordered_map:
        new member function: at
    type_traits:
        new traits:
            is_lvalue_reference, is_rvalue_reference
            has_trivial_default_constructor, has_trivial_copy_constructor, has_nothrow_default_constructor, has_nothrow_copy_constructor
    string:
        new type: u16string, u32string
        new function:
            stoi, stol, stoul, stoll, stoull, stof, stod, stold
            to_string, to_wstring
    algorithm:
        new function:
            all_of, any_of, none_of, find_if_not, copy_if, partition_copy, is_partitioned, partition_point
            minmax_element
            is_heap_until, is_heap
            is_sorted_until, is_sorted
            next,prev
            min,max,minmax
            copy_n
    random:
        new class: seed_seq and many new distributions
        new function: generate_canonical
        new type:
            ranlux24_base, ranlux48_base, ranlux24, ranlux48, knuth_b
            default_random_engine
        For classes linear_congruential, subtract_with_carry, mersenne_twister, discard_block, xor_combine:
            add ctor which accepts seed_seq, member function "seed" and the corresponding _engine class
    memory:
        new class: default_delete,unique_ptr
        new function: allocate_shared,make_shared
    functional:
        new functor: bit_and, bit_or, bit_xor
    numeric:
        new function: iota
    iomanip:
        new function:
            get_money, put_money
            get_time, put_time
    ios:
        new function: defaultfloat
    iosfwd:
        new type: char_traits<char16_t>, char_traits<char32_t>
    limits:
        new member function: max_digits10, lowest

    2. From tr1

    array
    unordered_map
    unordered_set
    regex
    random
    type_traits
    tuple
    functional
    utility
        get
    ios
        hexfloat

    3. Other

    set, map, unordered_map, unordered_set:
        member function "erase" will have return value (compatible with sequential container)
    string:
        pop_back, front, back (compatible with vector)
    bitset:
        ctor: unsigned long -> unsigned long long
    fstream:
        ctor/open accept string & wstring as the type of filename (C++03 only supports raw string pointer)

  • Enable syntax highlighting for TR1 headers in VS2008 SP1

    Unfortunately, VS2008 SP1 doesn't recognize C++ tr1 headers. That means there are no syntax highlighting and no intellisense for these files.

    This is a bug, but you can fix it by yourself. The trick is in the registry. VS maintains a list of extensionless files, and will treat them as cpp files.

    It is under: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\9.0\Languages\Extensionless Files\{B2F072B0-ABC1-11D0-9D62-00C04FD9DFD9}

    Just add tr1 headers into it and you'll now get full support of new tr1 features. Here is the reg file:

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\9.0\Languages\Extensionless Files\{B2F072B0-ABC1-11D0-9D62-00C04FD9DFD9}]
    "array"=""
    "random"=""
    "regex"=""
    "tuple"=""
    "type_traits"=""
    "unordered_map"=""
    "unordered_set"=""
    "xawrap"=""
    "xawrap0"=""
    "xawrap1"=""
    "xawrap2"=""
    "xfwrap"=""
    "xfwrap1"=""
    "xrefwrap"=""
    "xtr1common"=""
    "xxbind0"=""
    "xxbind1"=""
    "xxcallfun"=""
    "xxcallobj"=""
    "xxcallpmf"=""
    "xxcallwrap"=""
    "xxfunction"=""
    'xxmem_fn"=""
    "xxpmfcaller"=""
    "xxrefwrap"=""
    "xxresult"=""
    "xxtuple0"=""
    "xxtuple1"=""
    "xxtype_traits"=""

    Posted Saturday, December 13, 2008 3:39 AM by xiangfan | 2 Comments
    Filed under:

    Attachment(s): tr1.reg
  • VC extensions list

    MSDN has a page describing various VC extensions. But it is far from complete. 

    I've collected a list of nonstandard extensions provided by VC, some of them are evil. If you want to write standard conformant C++ code, you'd better be aware of these extensions which are on by default. Some commonly (mis-)used extensions are in bold.

    W4001: nonstandard extension 'single line comment' was used
    W4152: nonstandard extension, function/data pointer conversion in expression
    W4200: nonstandard extension used : zero-sized array in struct/union
    W4201: nonstandard extension used : nameless struct/union
    W4202: nonstandard extension used : '...': prototype parameter in name list illegal
    W4203: nonstandard extension used : union with static member variable
    W4204: nonstandard extension used : non-constant aggregate initializer
    W4205: nonstandard extension used : static function declaration in function scope
    W4206: nonstandard extension used : translation unit is empty
    W4207: nonstandard extension used : extended initializer form
    W4208: nonstandard extension used : delete [exp] - exp evaluated but ignored
    W4210: nonstandard extension used : function given file scope
    W4211: nonstandard extension used : redefined extern to static
    W4212: nonstandard extension used : function declaration used ellipsis
    W4213: nonstandard extension used : cast on l-value
    W4214: nonstandard extension used : bit field types other than int
    W4215: nonstandard extension used : long float
    W4216: nonstandard extension used : float long
    W4218: nonstandard extension used : must specify at least a storage class or a type
    W4221: nonstandard extension used : 'identifier' : cannot be initialized using address of automatic variable
    W4223: nonstandard extension used : non-lvalue array converted to pointer
    W4224: nonstandard extension used : formal parameter 'identifier' was previously defined as a type
    W4226: nonstandard extension used : 'keyword' is an obsolete keyword
    W4228: nonstandard extension used : qualifiers after comma in declarator list are ignored
    W4231: nonstandard extension used : 'identifier' before template explicit instantiation
    W4232: nonstandard extension used : 'identifier' : address of dllimport 'dllimport' is not static, identity not guaranteed
    W4233: nonstandard extension used : 'keyword' keyword only supported in C++, not C
    W4234: nonstandard extension used : 'keyword' keyword reserved for future use
    W4235: nonstandard extension used : 'keyword' keyword not supported on this architecture
    W4238: nonstandard extension used : class rvalue used as lvalue
    W4239: nonstandard extension used : 'token' : conversion from 'type' to 'type'
    W4240: nonstandard extension used : access to 'classname' now defined to be 'access specifier', previously it was defined to be 'access specifier'
    W4288: nonstandard extension used : 'var' : loop control variable declared in the for-loop is used outside the for-loop scope; it conflicts with the declaration in the outer scope
    W4289: nonstandard extension used : 'var' : loop control variable declared in the for-loop is used outside the for-loop scope
    W4353: nonstandard extension used: constant 0 as function expression.  Use '__noop' function intrinsic instead
    W4480: nonstandard extension used: specifying underlying type for enum 'enum'
    W4481: nonstandard extension used: override specifier 'keyword'
    W4482: nonstandard extension used: enum 'enum' used in qualified name
    W4509: nonstandard extension used: 'function' uses SEH and 'object' has destructor
    W4836: nonstandard extension used : 'type' : local types or unnamed types cannot be used as template arguments

    C2599: 'enum' : forward declaration of enum type is not allowed

  • Contiguousness of STL containers and string

    In C++, it is well-known that the data in the vector is contiguous. To be more specific, here is the quotation from the standard (C++03, 23.2.4/1)

    The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

    There're two points. First, vector<bool> is special. It is optimized for size, and the bools are packed. Second, &v[0] is only valid if v.size() > 0.

    (BTW: The above statement doesn't exist in C++98. Here is the link to the history: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#69)

    vector is designed to be an advanced version of raw array, so the guarantee of contiguous is convenient when raw pointer is needed to pass to low level API. We can simply use v.empty()?NULL:&v[0] (if you use default allocator which implies that the return value of "operator []" is a real reference, not a proxy).

    In C++0x, it adds "data" member function for the similar purpose.

    In contrast, other containers in STL don't store data contiguously.

    string is a little different. It is not a STL container and standard doesn't explicitly say whether the data is contiguous. But it also provides "data" member function which returns "const charT *". Here is the definition in C++03:

    If size() is nonzero, the member returns a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this
    The program shall not alter any of the values stored in the array.

    On the other hand, it is also said that for "operator []"

    If pos < size(), returns data()[pos].

    Because "operator []" return reference, for string s, we have &s[0] == data(). However, data() is not modifiable according to the bold text above. It is also confusing that we can get non-const reference from data().

    There is a issue about whether string data is contiguous: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530

    Fortunately, it will be explicitly stated in C++0x, see n2798.pdf 21.3.1/3:

    The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

    The C++0x draft also changes "operator []", so it no longer relies on "data" (in fact, "data" will have the same behavior of c_str in C++0x):

    If pos < size(), returns *(begin() + pos).

    So, in C++0x, the data of string is guaranteed to be contiguous, and you can pass &s[0] (if s.size() > 0 and you use default allocator) to low level API like vector. Cheers!

  • Protected or Private?

    As the designer of base class, you may hesitate whether to use private or protect access control. Then, let's try the following examples:

    1. Call protected member function

    #include <cstdio>

    class A
    {
    protected:
        void b() {printf("Oops!\n");}
    };

    void f(A* a)
    {
        class A_hack:public A
        {
            friend void f(A*);
        };
        static_cast<A_hack *>(a)->b();
    }

    class B
    {
    public:
        void f(A* a)
        {
            class A_hack:public A
            {
                friend B;
            };
            static_cast<A_hack *>(a)->b();
        }
    };

    int main()
    {
        f(NULL);
        B().f(NULL);
    }

    Although the result of the cast is undefined as stated in the standard, if no this pointer adjustment happens and the layout of A is the same as A_hack which are normally the case, the code will break the access control.

    For the evil user, his purpose is archieved.

    2. Call pure virtual function

    class A
    {
    protected:
        virtual void Fun() =0;
    };

    class B:public A
    {
    public:
        B() {Dummy();}

    private:
        void Dummy() {Fun();}
    };
    class C:public B
    {
    public:
        virtual void Fun() {}
    };

    int main()
    {
        C c;
    }

    Do you ever wonder how to call pure virtual function? Do you like to see what does _purecall in VC do? Just try the above code.

    The problem here is because of the protect access of "Fun" in base class. Of course, you should not call virtual function in ctor (they will not have polymorphic behavior), but when you call Dummy in ctor, you may not realize the fact that Dummy will call virtual function internally. Then the disaster happens.

    If what you want by using virtual is to allow the derived class to provide some specific behavior, you'd better declare the function as private.

    Although both the above code are nonconformant, they at least show the possibility that your user can do more than what you expect for protect.

    If you don't even have confidence with private (Like evil #define private public which definitely violates One Definition Rule), you'd better use the PImpl Idiom to implement your interface.

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker