A more flexible array template [and the evolution of the idea]

A more flexible array template [and the evolution of the idea]

  • Comments 4

{Disclaimer - I started with a bunch of code from boost::array - it's a great implementation of the functionality it provides}

While my day job doesn't generally allow me to goof around with advanced C++ features, I still managed to find time to do so after the kids are in bed.  Mark & I were talking about  a new feature in C99: variable length arrays.  They're incredibly useful for certain scenarios, and will, by most compilers I imagine, just get turned into an _alloca call.  This led to a discussion of the value of variable sized arrays, and to categorize them.  Here's the final result:

Type 1: Fixed size arrays:  Easy to do in C, boost::array<T, N> does it for C++ with all the bells & whistles of an STL container [tr1 has something awfully similar]
Type 2: Variable length arrays:    Hard to do in C, std::vector<T> does it for C++
Type 3: Variable sized arrays at construction time:  Easy to do in C (malloc or _alloca always produce these), doesn't exist currently in either boost or STL

When we started comparing these things, we came to the conclusion that C99 style variable length arrays would be very useful.  Using __forceinline and _alloca, you can produce an implementation that works for any scenario that stays in a single function scope.  As soon as you start working across scopes, or try to implement something as a global, it falls to pieces.  It also fails miserably if the compiler doesn't actually inline forceinline functions [which is the case for -Od, and anything with -Ob0].  And I don't have a very reliable way of detecting this scenario :-(

Ignoring the weaknesses, let's assume that the programmer is very intelligent and all-knowing, and will never accidentally abuse these routines.  Here's an initial implementation (just the part that matters) that supports stack-based allocation of both Type 1 and Type 3 arrays:

template<typename T, std::size_t N = 0>
class array : public impl::_common_array<T, array<T,N> > {
protected:
    friend impl::_common_array<T, array<T,N> >;
    T elems[N];    // fixed-size array of elements of type T

public:
    array(std::size_t n = N) { BOOST_ASSERT(n == N); }
};
 
template<typename T>
class array<T, 0> : public impl::_common_array<T, array<T,0> > {
protected:
    friend impl::_common_array<T, array<T,0> >;
    T *elems;
public:
    __forceinline array(std::size_t s) : sz(s), elems((T*)_alloca(sizeof(T)*s)
    {
        for (size_t i = 0; i < s; i++)
            T *temp = new(&elems[i]) T;
    }
    ~array()
    {
        for (size_t i = 0; i < sz; i++)
            elems[i].~T();          
    }
};

But I don't like allowing a programmer to shoot themselves in the foot accidentally (on purpose is fine :-)).  First, I added some code so that we don't try stack-allocation for DEBUG code, since the inliner is rarely enabled for that scenario.  Next I promoted a warning to an error so that if the routine ever doesn't get inlined, you'll fail.  Finally, I declared a private operator new, so that you can't construct one of these things on the heap and really screw up the world:

template<class T>
class array<T, 0> : public impl::_common_array<T, array<T,0> > {
private:
    void *operator new(std::size_t sz);
protected:
    friend impl::_common_array<T, array<T,0> >;
    T *elems;
public:
#if defined(_DEBUG) || defined(DEBUG)
 
#pragma message("Using lame implementation")
 
    array(std::size_t sz) : sz(sz), elems(new T[sz])
    {}
    ~array() {delete[] elems;}
 
#elif !defined(NODEBUG) && !defined(NDEBUG) && !defined(_NODEBUG)
 
#pragma message("Be sure that your optimized build is NOT using -Ob0 - this completely disables inlining")
#pragma error("Please define DEBUG or _DEBUG for your debug build, and NODEBUG, NDEBUG or _NODEBUG for your optimizing build, otherwise, this code won't work!")
 
#else
 
#pragma message("Using good implementation")
#pragma warning(error:4714)
#pragma optimize("b1", on)
#pragma auto_inline(on)
 
    // If this thing doesn't get inlined, the code won't work - you'll need to use the previous blob of code instead
    __forceinline array(std::size_t sz) : sz(sz), elems((T*)_alloca(sizeof(T)*sz))
    {
        for (size_t i = 0; i < sz; i++)
        {
            T* temp = new(&elems[i]) T;
        }
    }
    ~array()
    {
        for (size_t i = 0; i < sz; i++)
            elems[i].~T();          
    }
};

But I'd really like to allow these things to be a bit more general purpose!  Fact is, a tricky optimizer could turn a malloc/free into an alloca, and a fixed-size alloca into a zero-cost part of the stack frame adjustment.  But getting the compiler to decide that a variable sized malloc should be turned into an alloca seems error-prone.  What if I want an array of 4Mb?
There goes my stack - poof - in one fell swoop.

So, now, I'm headed to some sort of policy based memory allocation where the programmer could indicate that stack allocation should be used.

It's still unsafe, but not automatically so.  The programmer is the one who should understand the implications of the malloc/free -> alloca conversion.  If there is no allocated memory, we'll just new & delete the memory manually.  So, instead of auto-allocating memory on the stack, we'll allow the programmer to allocate memory that they wish to keep track of manually, or, if no memory is provided, we'll allocate it from the heap, and free it when the object is destroyed.

namespace kbf
{
    namespace impl
    {
        template <typename T, class CHILD>
        class _common_array
        {
        protected:
            __forceinline CHILD *downcast() { return (CHILD *)this; }
            __forceinline const CHILD *downcast() const { return (const CHILD *)this; }
        public:
            // type definitions
            typedef T              value_type;
            typedef T*             iterator;
            typedef const T*       const_iterator;
            typedef T&             reference;
            typedef const T&       const_reference;
            typedef std::size_t    size_type;
            typedef std::ptrdiff_t difference_type;
            typedef std::reverse_iterator<iterator> reverse_iterator;
            typedef std::reverse_iterator<const_iterator> const_reverse_iterator;

            // iterator support
            iterator                begin()        { return downcast()->elems; }
            const_iterator          begin()  const { return downcast()->elems; }
            iterator                end()          { return downcast()->elems+downcast()->size(); }
            const_iterator          end()    const { return downcast()->elems+downcast()->size(); }
            reverse_iterator        rbegin()       { return reverse_iterator(downcast()->end()); }
            const_reverse_iterator  rbegin() const { return const_reverse_iterator(downcast()->end()); }
            reverse_iterator        rend()         { return reverse_iterator(begin()); }
            const_reverse_iterator  rend()   const { return const_reverse_iterator(begin()); }

            // at() with range check
            reference       at(size_type i)       { rangecheck(i); return downcast()->elems[i]; }
            const_reference at(size_type i) const { rangecheck(i); return downcast()->elems[i]; }

            // front() and back()
            reference       front()       { return c_array()[0]; }
            const_reference front() const { return data()[0]; }
            reference       back()        { return c_array()[downcast()->size()-1]; }
            const_reference back()  const { return data()[downcast()->size()-1]; }

            // direct access to data (read-only)
            const T* data() const { return downcast()->elems; }

            // use array as C array (direct read/write access to data)
            T* c_array() { return downcast()->elems; }

            // assign one value to all elements
            void assign (const T& value) { std::fill_n(begin(),downcast()->size(),value); }

            // check range (may be private because it is static)
            void rangecheck (size_type i) const {
                if (i >= downcast()->size()) {
                    throw std::range_error("array<>: index out of range");
                }
            }

            // operator[]
            reference operator[](size_type i)
            {
                BOOST_ASSERT( i < downcast()->size() && "out of range" );
                return c_array()[i];
            }

            const_reference operator[](size_type i) const
            {    
                BOOST_ASSERT( i < downcast()->size() && "out of range" );
                return data()[i];
            }
        };
    }

    template<typename T, std::size_t N = 0>
    class array : public impl::_common_array<T, array<T,N> > {
    protected:
        friend impl::_common_array<T, array<T,N> >;
        T elems[N];    // fixed-size array of elements of type T
    public:
        array(std::size_t n = N) { BOOST_ASSERT(n == N); }
        // size is constant
        static size_type size() { return N; }
        static bool empty() { return false; }
        static size_type max_size() { return N; }

        // swap (note: linear complexity)
        void swap (array<T,N>& y)
        {
            std::swap_ranges(begin(),end(),y.begin());
        }

        // assignment with type conversion
        template <typename T2>
        array<T,N>& operator= (const array<T2,N>& rhs)
        {
            std::copy(rhs.begin(),rhs.end(), begin());
            return *this;
        }

    };

    template<typename T>
    class array<T, 0> : public impl::_common_array<T, array<T,0> > {
        // TODO:  This needs to be noncopyable, probably, or deep-copyable...
    private:
        void *operator new(size_t);
    protected:
        friend impl::_common_array<T, array<T,0> >;

        std::size_t sz:sizeof(std::size_t) * 8 - 1;
        unsigned int releaseMem : 1;
        T *elems;

    public:
        array(std::size_t s, void *memory = NULL) : sz(s), releaseMem(!memory), elems((T*)memory)
        {
            if (releaseMem)
            {
                memory = malloc(s * sizeof(T));
                elems = (T*)memory;
            }
            for (size_t i = 0; i < s; i++)
                T *temp = new(&elems[i]) T;
        }
        array(std::size_t s, const T &copyVal, void *memory = NULL) : sz(s), releaseMem(!memory), elems((T*)memory)
        {
            if (releaseMem)
            {
                memory = malloc(s * sizeof(T));
                elems = (T*)memory;
            }
            for (size_t i = 0; i < s; i++)
                T *temp = new(&elems[i]) T(copyVal);
        }
        ~array()
        {
            for (size_t i = 0; i < sz; i++)
                elems[i].~T();          
            if (releaseMem)
                free(elems);
        }

        // size is constant
        size_type size() const { return sz; }
        bool empty() const { return sz != 0; }
        size_type max_size() const { return sz; }

        // swap (note: linear complexity)
        void swap (array<T,0>& y)
        {
            // error check
            std::swap_ranges(begin(),end(),y.begin());
        }

        // assignment with type conversion
        template <typename T2>
        array<T,0>& operator= (const array<T2,0>& rhs)
        {
            BOOST_ASSERT(rhs.size() == this->size());
            std::copy(rhs.begin(),rhs.end(), begin());
            return *this;
        }

    };

    // comparisons
    template<typename T, std::size_t N>
    bool operator== (const array<T,N>& x, const array<T,N>& y) {
        return std::equal(x.begin(), x.end(), y.begin());
    }
    template<typename T, std::size_t N>
    bool operator< (const array<T,N>& x, const array<T,N>& y) {
        return std::lexicographical_compare(x.begin(),x.end(),y.begin(),y.end());
    }
    template<typename T, std::size_t N>
    bool operator!= (const array<T,N>& x, const array<T,N>& y) {
        return !(x==y);
    }
    template<typename T, std::size_t N>
    bool operator> (const array<T,N>& x, const array<T,N>& y) {
        return y<x;
    }
    template<typename T, std::size_t N>
    bool operator<= (const array<T,N>& x, const array<T,N>& y) {
        return !(y<x);
    }
    template<typename T, std::size_t N>
    bool operator>= (const array<T,N>& x, const array<T,N>& y) {
        return !(x<y);
    }

    // global swap()
    template<typename T, std::size_t N>
    inline void swap (array<T,N>& x, array<T,N>& y) {
        x.swap(y);
    }

} /* namespace kbf */

This is where I am currently. 
This seems incredibly useful, and still very performant.
What I'd really like is to be able to eliminate vectors from most of my code, and use these things unless I really do need growable vectors...

I'm working on writing something up about how to use chained unwind info next, hopefully that won't be quite so long in coming as this post...

-Kev

Leave a Comment
  • Please add 5 and 4 and type the answer here:
  • Post
  • PingBack from http://mark.santaniello.net/archives/247
  • I suggest:

    a. Using a different class instead of partial specialization.
    b. Destructing the allocated array in the reverse order.
    c. Rewrite your ctors to handle exceptions if T's ctor fails.
  • I'll make all 3 modifications.  The partial specialization was really about code reuse, and I've decided that you want them to be explicitly different, due to weird copying semantics.  Thanks for the comment, asdf!

  • malloc could fail.

    s * sizeof(T) could overflow.

    I didn't check but array< array <T,N > M > seems to have some issues with new?

Page 1 of 1 (4 items)