Bad Recursion Revisited

Bad Recursion Revisited

  • Comments 20

We have internal email lists for questions about programming languages. Here's one that came across recently that I thought illustrated a good point about language design.

An interview candidate gave the following awful implementation of the factorial function. (Recall that factorial is notated "n!", and is defined as the product of all the integers from 1 to n. 0! is defined as 1. So 4! = 4 x 3 x 2 x 1 = 24.)

If you note that n! = n x ((n-1)!) then a recursive solution comes to mind. When asked to implement the factorial function in C, an interview candidate came up with this:

int F(int x){
  return (x > 1) ? (x * F(--x)) : x;
}

Now, leaving aside for the moment the fact that this badly-named function does no bounds checking on the inputs, potentially consumes the entire stack, returns a wrong or nonsensical answer for inputs less than one, and is a recursive solution to a problem that can easily be solved with a simple lookup table, it has another big problem -- it doesn't even return the correct answer for any input greater than one either! F(4) will return 6, not 24, in Microsoft C.

And yet the seemingly equivalent C#, JScript and VBScript programs return 24.

The question was "What the heck is up with that?"

This looks perfectly straightforward. If x is 4, then we evaluate the consequence of the conditional operator...

4 * F(3)
= 4 * 3 * F(2)
= 4 * 3 * 2 * F(1)
= 4 * 3 * 2 * 1
= 24

right? What is broken with C?

Page 52 of K&R has the answer.

"C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are &&, ||, ?: and ','.) For example, in a statement like x = f()+g(); f may be evaluated before g or vice versa; thus if either f or g alters a variable on which the other depends, x can depend on the order of evaluation."

The Microsoft C compiler actually evaluates the function call first, which causes x to be decremented before the multiplication. So this is actually the same as

3 * F(3)
= 3 * 2 * F(1)
= 3 * 2 * 1
= 6.

Why does the specification call out that the compiler can choose any order for subexpression evaluation? Because then the compiler can choose to optimize the order so that it consumes a small number of stack or register slots for the results.

Of course, C was written back in the dark ages -- the 1970's -- when indeed most languages were pretty ill-specified and full of terrible "gotchas" like this. JScript, VBScript, C#, and most modern languages do guarantee that functions will be evaluated in left-to-right order. In all these languages,

x = f() + g() * h();

will evaluate f, then g, then h.

As K&R notes "The moral is that writing code that depends on order of evaluations is a bad programming practice in any language." Amen to that!

  • Just FYI:

    I tried it using gcc and I got the right answers. I used cygwin's gcc version 3.3.3 on Win XP SP2:

    ------------------------SNIP--------------------------------
    #include "stdio.h"

    int F(int x)
    {
    return (x > 1) ? (x * F(--x)) : x;
    }

    int main()
    {
    printf("F(1): %d\n",F(1));
    printf("F(2): %d\n",F(2));
    printf("F(3): %d\n",F(3));
    printf("F(4): %d\n",F(4));

    return 0;
    }
    ------------------------SNIP--------------------------------
  • This also illustrates the tendency for many C/C++ programmers to try and cram as much 'functionality' into one expression.

    This is usually a pointless exercise in efficiency, at the cost of readability (and often, as in this case, correctness).

    For some reason, it's often perceived as bad form in C/C++ to write such a function like so:

    int factoral( int x) {
    if (x < 0) {
    return -1; /* error result */
    }

    if (x <= 2) {
    return( x);
    }

    return( x * factoral( x-1));
    }


    Granted - this example still doesn't deal with stack or arithmetic overflow issues and it still uses a recursive algorithm when an iterative one would be better in most respects.

    However my point is that I contend that this example is every bit as efficient as an implementation that tries to cram the whole logic into a single expression (the magic of compilers takes care of the efficiency), while being much easier to read, comprehend, and get correct.

    Why do many (most?) programmers still believe that the original example's style is still more 'elegant'?
  • Not sure if this would work, but couldn't you force the decrement by using parenthesis around --x?

    return (x > 1) ? (x * F( (--x) )) : x

    Regardless, I think it is a bad idea to put this in all one line anyhow. Let alone a recursive function.

    I would have thought of this since you don't incurr the overhead of function calls or consume the stack. Obviously, this is obvious.

    int factorial = 1;
    for (int i = 1; i <= 4; i++) {
    factorial *= i;
    }
  • This also highlights the fact that so many colleges/universities teach recursion in their programming courses, the classic example being Factorials and the Fibonacci number sequence.

    What they do not tell you is that recursion is something that should only be used in very rare cases.
  • "How would parens _help_? The PROBLEM is that the --x is evaluated first!"

    Oh well, uh... I don't know how parens would help. I think ths real solution is that I probably should read your articles more closey. :)
  • How would parens _help_? The PROBLEM is that the --x is evaluated first!
  • The problem with code like x*F(--x) isn't simply order-of-evaluation -- it goes much deeper. The C standard says that, between any pair of adjacent sequence points, you can modify a given object no more than once, and that if you modify it, you may read it only to determine the new value to be stored. So x=x+1 is ok, as is x++; but in x*F(--x) you modify x once (with the predecrement), read it once to determine the value to be stored, and read it a second time for the multiplication. The standard says that this is undefined behaviour; a conforming compiler is permitted to do anything at all when compiling or running such code, including format your hard disk.

    Order of evaluation isn't as bad as this -- the order is merely unspecified, rather than actually undefined.

    'Sequence points' occur at: the end of a statement; after an expression evaluated for controlling an if, while, or for; a sequencing comma (though not a function-argument-separating comma); after the antecedent of a logical || or &&; after the antecedent of a ?: conditional operator; and immediately before a function call (though you don't know at what point the function is called in the evaluation of the surrounding expression).
  • I have heard the tendancy for programmers to love these tiny, jewel-like, over-clever expressions called "APL Syndrome", as APL particularly encourages that style of programming. However, googling that for the origin of the phrase results in an avalanche of pages about antiphospholipid syndrome and acute promyelocytic leukemia.

    It's only natural to try and find a clever and concise way to represent an algorithm. What some programmers fail to realize is that "clever" and "extremely dense" are not necessarily good qualities for code to have! They work against goals like correctness, debuggability and understandability.
  • Billy Bob:

    The problem is that expression evaluation in C can only be ordered by 'sequence points'. Between sequence points, the compiler is permitted to order the evaluation of sub-expressions however it likes, and having side-effects of sub-expressions modify the values of elements used elsewhere in the expression results in undefined behavior. (Wow, that's a mouthful).

    Precedence does *not* factor into sequence points, so in general, parens do not affect the order of evaluation of sub-expressions (except to the extent that the compiler must ensure that the final result of the expression is correct according to precedence rules).

    See the comp.lang.c FAQ, Section 3, for the gory details on this sometimes confusing subject:

    http://www.faqs.org/faqs/C-faq/faq/
  • The best part of the proposed solution is that the decrement on the x is totally unnecessary. Replacing F(--x) with F(x-1) produces the correct solution (still inefficiently, but possibly slightly less so).
  • Yikes! You have totally spooked me now, since I have always assumed that the order of evaluation was defined by the associativity of an operator [I was at least aware that there is no defined eval order for function parameters]
  • As a current undergraduate at an American University, my first CS class used Scheme in a "functional" style to get us away from any preconceived notions about computing. And we were shown a bunch of "elegant"/"clever" solutions and required to solve our problems recursively.

    I suppose this style is not good in C because of its lack of tail recursion and poor in the real world due to the messiness of real problems.
  • I agree with mikeb.

    Its bad form to think that a one liner function is more 'elegant' than a verbose one. Programmers always have to keep in mind that performance is not their one and only goal, maintainability is an equally important goal.

    Mikeb's function is so much easier to 'read' compared to the original one, where I had to mentally run through a few 'what-if-x-is-1' type of scenarios to determine which item would be evaluated.

    Recursion vs. iteration is a good debate. Programmers (especially beginners) like the elegance provided by recursive functions, since its something like a self-solving system - the solution refers to a subset of the problem (turtles all the way down?).

    A couple of crash-and-burn episodes with recursion will teach them to better analyze the more suitable approach.
  • Oh yeah, forgot to add this.

    I recall reading somewhere that not knowing the intricate details of a language standard is a good thing. The argument is that people knowing certain intricate details (expression evaluation order, etc.) tend to abuse or take it for granted (True And False Or True: which evaluates first?).

    I've found that this tip has stood me in good stead. I try not to use operator precedences, I write my code as though I don't know the order of evaluation.

    The increase in the number of parentheses can be tackled with indents and spacings.
  • Brian: replacing F(--x) with F(x-1) helps, but does not make the function entirely correct. It makes the expression well-defined as far as C goes, but the function still returns an in correct result when the input is 0.

    Phylyp: as far as precedence rules go, I've always liked the one put forth by Steve Oualline in "Practical C":

    --------------------
    There are fifteen precedence rules in C (&& comes before || comes before ?:). The practical programmer reduces these to two:

    1) Multiplication and division come before addition and subtraction.
    2) Put parentheses around everything else.
    --------------------
Page 1 of 2 (20 items) 12