The Filterator

The Filterator

  • Comments 21

My name is Ahmed Charles and I currently work on Windows Error Reporting.  I believe that this is the first time that someone not on the VC team has written a blog, but I hope you will find it useful anyways.  I’d like to thank Stephan T. Lavavej for the idea and valuable feedback while writing it.  And we both owe the idea for the iterator to Boost.

 

A common question from programmers who have an intermediate amount of experience with using the STL is, "How do I write an STL iterator?".  Writing an STL iterator is not especially difficult – it just requires some care.  STL iterators must satisfy a number of requirements (given by section 24.2 of the (soon to be) International Standard for C++, draft at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf), and the code to do so takes roughly 150 editor lines.  Figuring out the code from the requirements can be overwhelming, but once you see the code, it's easy.

 

Therefore, I've written an example STL iterator, whose purpose in life is to wrap an existing iterator and filter the elements based on a supplied predicate, which I've imaginatively called Filterator (in homage to the Mallocator).  I've carefully implemented all of the checks that would be required in real production code.  And I've exhaustively commented the various parts of the iterator to aid in determining which portions should be changed when implementing other iterators.  Hopefully, this should demystify the implementation of STL iterators:

C:\Temp>type filterator.cpp

// The following headers are required for all iterators.

#include <iterator>    // Required for iterator_traits

 

// The following headers contain stuff that filterator uses.

#include <type_traits> // Required for std::conditional/

                       // is_same/is_convertible

#include <memory>      // For std::addressof

 

// The following headers contain stuff that main() uses.

#include <algorithm>   // For std::fill

#include <iostream>    // For std::cout

#include <ostream>     // For std::endl

#include <string>      // For std::string

#include <vector>      // For std::vector

 

template <class Iterator, class Predicate>

class filterator {

public:

    // Required to access private members in the converting

    // constructor because different instantiations of a

    // template are different classes without special access

    // rights to each other.

    template <class OtherIterator, class OtherPredicate>

        friend class filterator;

 

    // Define iterator_category to be the iterator category

    // of the Iterator used, unless it is a random access

    // iterator, which cannot provide constant time

    // functions for random access operators because of

    // needing to test the predicate.

    typedef typename std::iterator_traits<

        Iterator>::iterator_category underlying_category;

 

    typedef typename std::conditional<std::is_same<

            underlying_category,

            std::random_access_iterator_tag

        >::value,

        std::bidirectional_iterator_tag,

        underlying_category>::type iterator_category;

 

    // Define the other typedefs required by iterators.

    typedef typename std::iterator_traits<

        Iterator>::value_type value_type;

    typedef typename std::iterator_traits<

        Iterator>::reference reference;

    typedef typename std::iterator_traits<

        Iterator>::pointer pointer;

    typedef typename std::iterator_traits<

        Iterator>::difference_type difference_type;

 

    // Default constructor which value-initializes all of

    // the members.

    filterator() : m_iter(), m_end(), m_pred() {}

 

    // Constructs a filterator from all possible parameters.

    // pred may be default constructed, as well.

    filterator(Iterator begin, Iterator end,

        Predicate pred = Predicate())

        : m_iter(begin), m_end(end), m_pred(pred) {

        validate_predicate();

    }

 

    // Converting constructor which can be used to convert

    // iterators into const_iterators.

 

 

    template <class OtherIterator> filterator(

        const filterator<OtherIterator, Predicate>& f,

        typename std::enable_if<

            std::is_convertible<

                OtherIterator, Iterator

            >::value>::type * = nullptr)

        : m_iter(f.m_iter),

        m_end(f.m_end),

        m_pred(f.m_pred) {}

 

    // Returns a copy of the predicate for this filterator.

    Predicate predicate() const {

        return m_pred;

    }

 

    // Returns a copy of a filterator representing the end

    // of this sequence.

    filterator end() const {

        return filterator(m_end, m_end, m_pred);

    }

 

    // Returns a copy of the end iterator for this

    // filterator.

    Iterator base_end() const {

        return m_end;

    }

 

    // Returns a copy of the iterator which represents the

    // current position.

    Iterator base() const {

        return m_iter;

    }

 

    // Dereferences this iterator

    reference operator*() const {

        return *m_iter;

    }

 

    // Dereferences this iterator

    pointer operator->() const {

        return std::addressof(*m_iter);

    }

 

    // Pre-increment operator which checks the predicate

    // before stopping.

    filterator& operator++() {

        ++m_iter;

        validate_predicate();

        return *this;

    }

 

    // Post-increment operator which checks the predicate

    // before stopping.

    filterator operator++(int) {

        filterator temp(*this);

        ++*this;

        return temp;

   }

 

    // Pre-decrement operator which checks the predicate

    // before stopping.

    filterator& operator--() {

        --m_iter;

        // note that we don't store a begin iterator to

        // check for passing the beginning of the sequence,

        // unlike the end. This is because decrementing the

        // first iterator is undefined and applying a filter

        // on top of the underlying iterator doesn't change

        // this, even if the resulting sequence is empty.

        while (!m_pred(*m_iter)) --m_iter;

        return *this;

    }

 

    // Post-decrement operator which checks the predicate

    // before stopping.

    filterator operator--(int) {

        filterator temp(*this);

        --*this;

        return temp;

    }

 

    // Allows two filterators to be compared for equality.

    template <class OtherIterator> bool operator==(

        const filterator<OtherIterator,

            Predicate>& f) const {

 

        return m_iter == f.m_iter;

    }

 

private:

    // Member variable to hold the current iterator

    // position.

    Iterator m_iter;

 

    // Member variable to hold the end of the sequence,

    // so we don't go past it when checking the predicate.

    Iterator m_end;

 

    // Member variable to hold the predicate.

    Predicate m_pred;

 

    // helper function to advance m_iter until it matches

    // the predicate.

    void validate_predicate() {

        while (m_iter != m_end && !m_pred(*m_iter)) {

            ++m_iter;

        }

    }

};

 

// Allows two filterators to be compared for inequality.

template <class Iterator1, class Iterator2, class Predicate>

    bool operator!=(

        const filterator<Iterator1, Predicate>& a,

        const filterator<Iterator2, Predicate>& b) {

    return !(a == b);

}

 

// Convenient function to create filterators with deduced

// Predicate and Iterator parameters. May also be used with

// an explicit Predicate which is default contructed.

template <class Predicate, class Iterator>

    filterator<Iterator, Predicate> make_filterator(

        Iterator begin, Iterator end,

        Predicate pred = Predicate()) {

 

    return filterator<Iterator, Predicate>(

        begin, end, pred);

}

 

// Functor used to determine if the length of the string

// is less than four.

struct shorter_than_four {

    bool operator()(const std::string& s) const {

        return s.length() < 4;

    }

};

 

int main() {

    using namespace std;

 

    vector<string> v;

 

    v.push_back("Meow");

    v.push_back("Kitty");

    v.push_back("Dog");

    v.push_back("Purr");

    v.push_back("Bo");

 

    cout << "strings with length()"

        " greater than or equal to 4:" << endl;

 

    auto four_or_more = [](const string& s) {

        return s.length() >= 4;

    };

 

    auto print = [](const string& s) {

        // std:: is used because the using statement

        // doesn't affect the lambda

        std::cout << s << " " << s.length() << std::endl;

    };

 

    for (auto i = make_filterator(v.cbegin(), v.cend(),

        four_or_more); i != i.end(); ++i) {

        print(*i);

    }

 

    cout << "strings with length() less than 4:" << endl;

 

    auto dogs = make_filterator(v.begin(), v.end(),

        shorter_than_four());

 

    filterator<vector<string>::const_iterator,

        shorter_than_four> cdogs(dogs);

 

    for_each(cdogs, cdogs.end(), print);

 

    cout << "Replace dogs with cats:" << endl;

 

    // Uses the copy assignment operator to assign "Cat"

    // to the elements pointed to by the iterator sequence.

    fill(dogs, dogs.end(), "Cat");

 

    for_each(v.cbegin(), v.cend(), print);

}

 

C:\Temp>cl /EHsc /nologo /W4 filterator.cpp

filterator.cpp

 

C:\Temp>filterator

strings with length() greater than or equal to 4:

Meow 4

Kitty 5

Purr 4

strings with length() less than 4:

Dog 3

Bo 2

Replace dogs with cats:

Meow 4

Kitty 5

Cat 3

Purr 4

Cat 3

 

 

To explain this output:

 

The first three strings that are printed are based on a lambda, which tests for a length greater than or equal to 4.

 

The next two strings that are printed are based on a functor, which tests for a length less than 4.

 

The next set of strings are all of the elements after we have used std::fill and the filterator to replace all strings with length less than 4 (Dog and Bo, in this case) with Cat.

 

This shows that the iterator works with standard algorithms, much like the standard iterators.

 

For more information on using auto and lambdas, see http://blogs.msdn.com/b/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx

  • Please reformat your blog entry to make it more readable, it's a mess.

    Thanks.

    Max.

  • Thanks for the post.

    I had to re-read it a few times.

    This line:

    auto dogs = make_filterator(v.begin(), v.end(), shorter_than_four());

    Here, you are associating the functor, shorter_than_four(), with the iterator, dogs.

    Then, only when for_each on the next line increments the iterator does the predicate get evaluated.  So no additional array gets stored in creating dogs.

    I like.

    Does this functionality in any of the implemented iterators exist in the STL already or not?

  • First bug report!  The default constructor will store uninitialized values if the 'base' iterator is a pointer.  You should explicitly value-initialize each member in the constructor's initializer list to avoid undefined behaviour.  Otherwise, this is a pretty neat article, showing off not just iterators but also how some of the other 0x language and library facilities come together - nice!

  • Please consider using one of the many available code formatting widgets available so we can read your code sample. Perhaps maybe this one?

    code.google.com/.../codeformatterpluginforwindowslivewriter

  • Max, Mark R. - I rewrapped Ahmed's code to 60 columns. It should be readable now. I also restored the example's output, which I had mistakenly dropped on the floor during my initial attempt at rewrapping.

  • @Ben L: The STL doesn't currently have this functionality. I believe that this iterator meets all of the standard's (C++0x) requirements other than complexity. That one is a bit more ambiguous since the standard doesn't specify which sequence the constant amortized complexity refers to. If it's the underlying sequence, this iterator is fine, but if it's the sequence represented by reduced sequence, then we don't meet the requirement.

    @AlisdairM: Thanks for the bug. I was debating with STL (Stephan T. Lavavej) about whether having value-initialization would matter and we figured it was best to make the extra guarantee. And one of my goals was to showoff the new C++0x features.

  • @Ben L: "only when for_each on the next line increments the iterator does the predicate get evaluated"

    A minor correction - the constructor will call the predicate (possibly repeatedly) to find the first valid iterator int he range.

  • @mikeb Good point, that would not be ideal behavior for many situations.  Since it's possible that no items might be valid, I would rather the constructor just hold a flag indicating whether it has ever been validated against the predicate. This might add overhead the other methods.  Any takers on how to keep this lazy until usage?

  • The goal wasn't really to write the ideal filter iterator, but to write an iterator example that people could learn from to produce their own iterators for use with the STL.

  • Why would you construct a filterator if you aren't going to use it? And if you are going to use it, you're going to have to start with the first good element, and that requires sliding the underlying iterator forward until you find it. If all elements are bad, it's going to take you linear time to discover that fact.

    Note that after the filterator's constructor establishes its invariant, copying that filterator around won't invoke the predicate.

    I see no advantage to implementing laziness here, and two disadvantages (decreased performance and increased code complexity). The only possible advantage of laziness that I can imagine is if the elements of the sequence are changing between filterator construction and initial use. (Not in concurrent terms, which would be horrible crashtrocity, but in sequential terms.) In that case, an eager implementation and a lazy implementation could observe different elements as being the first good elements.

    As a library developer, my professional term for that is "squirrely". I would have no problem explaining to users that with an eager implementation, construction and incrementing/decrementing are what scan for good elements. (With a lazy implementation, initial dereferencing and incrementing/decrementing would scan for good elements, and that's actually more confusing.)

  • It should be noted that something like this is _much_ easier to write using boost::iterator_adapter/iterator_facade.

    I almost think that any c++ iterator article should mention the above utilities, as they take care of 85% of the scaffolding involved in writing an iterator class.

  • Think of initially constructing a filter-iterator pair.  How do I know if the range is empty?  Normally I would compare the two iterators, but if the first iterator has not yet tried filtering, this will (typically) return false.  OK, now that I know the range is (apparently) not empty, can I safely dereference the first iterator?  In order for this iterator to work from a client perspective, it must either seek the first non-filtered value on construction, or support an initial 'not yet filtered' state and check that as a precondition for many members.  I think the find-first on construction in the simpler, safer behaviour.

  • @Marcus Lindblom: It is much easier to write with Boost.Iterators. But not everyone can use the Boost C++ Libraries for various reasons, including legal ones. It is also a matter of learning. If you want to understand how to write an iterator using Boost.Iterators, their documentation is plenty sufficient and this blog post wouldn't be useful at all.

  • What legal reason could prevent someone from using Boost?

  • I liked this blog entry. I think that the examples were very helpful. I just want to point out that CEND is not portable for all compilers. Some compilers (such as gcc) will not compiler this code. However, great post in any case.

Page 1 of 2 (21 items) 12