C++ provides us with a number of ways we can shoot ourselves in the foot. If you work as a developer on any sizeable software project, there is quite likely a set of rules about limitations as to the C++ language features you’re allowed to use: no memory allocation in constructors, no operator overloading, exceptions used only for error handling, etc. Repentance for violating these rules often resembles that for breaking the master build—i.e. some obligation to perform some menial yet necessary task for the group and/or the possession of some object of indistinction (“The Hat” or “The Hose”).

There are, however, reasons for flexibility where at least some of these rules are concerned, and I’ll offer as an example some practical considerations in favor of allowing the use of operator overloading for operator() (that’s read as “function-call operator”). If you happen to be one of the lucky ones who work on projects with only one or two other programmers, or even just by yourself, stick around. ‘Cause function objects are cool, and I’m about to tell you why.

Those of us who write software for the Macintosh know that the world is really divided into three: Windows, Macintosh and Unix. How do we know this? Because we’re constantly having to manipulate the line endings of various text files. Curiously enough, the standard Mac OS X install doesn’t have a neat little command-line tool for converting line endings (or at least “apropos” with various forms of the phrase “convert line endings” yields “nothing appropriate”). There isn’t even one among the little more than two dozen command line tools in Apple’s Developer Tools.

So, if you often find yourself having to flip the line endings of various text files, you’ll either open them all up in, say, XCode or BBEdit, and manipulate the line endings by hand, or you’ll write a little command line tool that will do what you need. The benefit of the latter is that it can handle large numbers of files all at once.

Should you write one such tool, you’re quite likely to have three different functions in your code that look something like this:

/*----------------------------------------------------------------------------
    %%Function: ClnConvertToWin
    %%Contact: richs
----------------------------------------------------------------------------*/
int ClnConvertToWin(InputFile &in, OutputFile &out)
{
    char chPrev = chNul;
    int cln = 0;

    for (;;)
        {
        int ch = in.ChGet()
        
        if (in.FEof())
            break;
        
        switch(ch)
            {
        default:
            out.PutChar(ch);
            break;

        case chCR:
            out.PutChar(ch);

            if (chLF != in.ChPeek())
                {
                out.PutChar(chLF);
                cln++;
                }
            break;

        case chLF:
            if (chPrev != chCR)
                {
                cln++;
                out.PutChar(chCR);
                }
            out.PutChar(ch);
            break;
            }
        chPrev = ch;
        }

    return cln;
}

This code, which converts an arbitrary input file to Windows’ line endings, looks simple enough. It reads a character from the input file one character at a time, and performs some specific action based on which character is just read. It keeps track of the number of lines that it’s converted, and returns that count when it’s all done.

The two other versions of this function likely have the exact same loop control and differ only by the structure of the switch statement that does the actual conversion of line endings.

This is bad, because you now have three separate loops in your code that are almost identical. Suppose, for example, that you move this code to a system where the “read a character, then test for end-of-file” construct isn’t the most efficient or robust way to read characters from a file. You now have three separate loops of code to change, and three separate opportunities to create bugs in the code.

In the old days, we might have resolved this problem by using function pointers, but they’re clumsy. Also, function pointers provide no opportunity for the compiler to optimize out the function-call semantics. You’re going to be stuck with full procedure prologue and epilogue with every iteration through that loop. For performance reasons, as well as maintenance reasons, we don’t want to use function pointers in this particular application.

 With C++, however, we can encapsulate the switch statement into a function object, and put the control loop in a template function that takes as a parameter a reference to an object that overloads operator(). The template that encapsulates the loop might look like:

/*----------------------------------------------------------------------------
    %%Function: ClnConvertLines
    %%Contact: richs
    
----------------------------------------------------------------------------*/
template <class CharConverter>
int ClnConvertLines(InputFile &in, CharConverter &cnv)
{
    int cln = 0;
    for (;;)
        {
        int ch = in.ChGet();
        
        if (in.FEof())
            break;

        cnv(ch, cln);
        }
    
    return cln;
}

And the function object that converts arbitrary line endings to Windows might look like:

/*----------------------------------------------------------------------------
    %%Class: ToWin
    %%Contact: richs
----------------------------------------------------------------------------*/
class ToWin
{
public:
    ToWin(InputFile &anIn, OutputFile &anOut) :
            in(anIn),
            out(anOut),
            chPrev(chNul) {};
    ~ToWin() {};
    void operator()(int ch, int &cln)
        {
        switch(ch)
            {
        default:
            out.PutChar(ch);
            break;

        case chCR:
            out.PutChar(ch);
            if (chLF != in.ChPeek())
                {
                out.PutChar(chLF);
                cln++;
                }
            break;

        case chLF:
            if (chPrev != chCR)
                {
                cln++;
                out.PutChar(chCR);
                }
            out.PutChar(ch);
            break;
            }
        chPrev = ch;
        };
private:
    int chPrev;
    OutputFile &out;
    InputFile ∈
};

With that, our original conversion function becomes:

Inline int ClnConvertToWin(InputFile &in, OutputFile &out)
{
    ToWin cnv(in, out);
    return ClnConvertLines(in, cnv);
}

I should point out that there is no a priori reason for ClnConvertLines to be a template. We could have defined a base class, CharConverter, that virtualized operator(), and made ToWin a subclass of CharConverter. In this particular case, however, the virtualized base class approach isn’t any better than the old-style, function pointer approach. In fact, on some systems, it’s worse, because you have the double-dereference through an object’s v-table instead of the single dereference of a function pointer.

The template-based solution, while it yields more object code in that ClnConvertLines will get instantiated for every different flavor of cnv object we give it, is much faster for our application. Because the template-based solution gets expanded in line, there is an opportunity for the compiler to optimize out the function-call semantics where the overloaded operator() is invoked—one of those rare instances where we get to have our cake and eat it too.

Now, if that weren’t cool enough, the fact that we’ve abstracted out the actual conversion of line endings into a separate piece of source code leads to a flexibility one wouldn’t want to entertain in the purely functional approach. For example, suppose we know that a particular input file has Macintosh line endings. Scanning the beginning of an input file to figure out the existing line endings isn’t all that hard, and is well worth the time if it greatly simplifies our inner loop. The implementation of the line conversion from Macintosh to Windows line endings is almost trivial:

/*----------------------------------------------------------------------------
    %%Class: MacToWin
    %%Contact: richs
----------------------------------------------------------------------------*/
class MacToWin
{
public:
    MacToWin(OutputFile &anOut) :
            out(anOut) {};
    ~MacToWin() {};
    void operator()(int ch, int &cln)
        {
        out.PutChar(ch);

        if (ch == chCR)
            {
            out.PutChar(chLF);
            cln++;
            }
        };
private:
    OutputFile &out;
};

You wouldn’t entertain something like this in the purely functional approach, because the proliferation of code with the same loop semantics is something you want to avoid. If having just three duplicates of that outer loop is bad, having one for every possible known combination of input and output line endings is that much more of a maintenance headache. With function objects, we can proliferate to our heart’s content without increasing the level of maintenance required should we decide to change the semantics of the loop control.

By now, there’s at least one astute reader who’s thinking, “Gosh, Schaut, flipping line endings isn’t all that different from iterating through one of the Standard Template Library’s collection classes. Using function objects should be obvious. What’s all the fuss about?”

Such an astute reader would be absolutely correct: they way I’ve used function objects here is almost exactly the way function objects are used in the STL. In fact, we can take that line of thought and extend it to the concept of an input iterator.

Think about how one might use a command-line tool to convert line endings. Some times, you’ll want to just invoke the tool on a single file. Other times, you’ll want to invoke the tool on a whole bunch of files in a single directory. On still other occasions, you’ll want to use some complex find command to generate a list of files in an entire directory tree, and pipe the output of that command through the line converter’s standard input file.

So, you’ll have two distinct ways of getting a list of files to convert: as an array of C-style strings provided on the command line or as a list of file names coming in via your standard input file. The structure of the loop to convert files and report the progress of that conversion to the user ought not change simply because we’re getting a list of files in two distinctly separate ways. This problem screams for a solution where input iterators are implemented as function objects.

I’ll leave the actual implementation of this as an exercise for the reader, but there is one thought to consider. The input iterator is in an outer loop, not an inner loop, and the function that figures out which particular conversion loop to invoke is likely to be complex enough that we wouldn’t want multiple copies of it in our object code. In this case, I would avoid a template-based approach in favor of defining a base class for our input iterators where the operator() is virtualized.

Hopefully, this will lead some of you to think more about using function objects in your daily work—in particular, I’d want you to think that function objects are useful outside something as complex as the Standard Template Library. If function objects can improve our implementation of something as mundanely simple as flipping line endings in text files, they just have to be cool enough to use in a wide variety of contexts.

 

Rick Schaut

Currently playing in iTunes: Sierra Leone by Derek Trucks Band

Update: Fixed the template definition for ClnConvertLines (convert to HTML entities).