# Every Program There Is, Part Four

### Every Program There Is, Part Four

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

Remember, the point of this exercise was to come up with a device to help me test parts of my compiler. To test binary trees as we saw we can generate all trees of size one, then all trees of size two, then all trees of size three, and so on. This technique has some nice properties. First off, we know that we are exploring the space of binary trees without skipping any; for any particular tree, we will eventually get to it. Second, we are guaranteed to get the small ones first; it seems likely that if an algorithm works for all trees of size five or less, then it works for all trees period. And third, at no point does the algorithm run off into infinity; there are no infinite loops to worry about. There is a combinatorial explosion to worry about, since there are lots of possible trees, but each one can be generated in finite time. It would be nice to have similar properties for code which produces every sentence given some grammar.

The obvious brute-force way to do this involves building a parser for the grammar. That is, generate all length-one strings, parse them, and discard the ones that don’t parse legally. Then generate all length-two strings, and so on. The problem with this approach is that in order to generate, say, class c : b { }, even if you restrict yourself to an alphabet containing only lowercase letters, spaces, and punctuation, you have to generate and parse about a trillion trillion strings, the vast majority of which will be obvious non-programs like aaaaaaaaaaaa},c. This method is far too slow.

A smarter way to go would be to generate all strings that consist of one terminal, parse them, discard the illegal ones, then generate all strings that consist of two terminals, and so on. That is, start by parsing a, class, {, and so on. That’s way better; it only takes about a million steps to get to class c : b { }. But that’s still pretty slow, and we have to write a parser. What would be nicer is to simply not have to ever generate illegal programs in the first place.

What if instead we had some way to say “generate me all legal programs with one terminal; ok, now generate me all programs with two terminals”, and so on? That is, don’t create a candidate and test it, just create all the legal candidates. That would work. But how to do it?

Let’s go back to our first problem: generating strings for all the binary trees. A grammar for that language is:

T: x | ( T T )

We can use the same technique we did before! Remember, before we generated all the binary trees of size four by saying that they were:

{all binary trees of size 0 } followed by {all binary trees of size 3}
{all binary trees of size 1 } followed by {all binary trees of size 2}

and so on. Well, all binary trees strings with, say, eight terminals are:

( {all strings with 0 terminals} {all strings with 6 terminals} )
( {all strings with 1 terminal} {all strings with 5 terminals} )

and so on. That is, we take out the two terminals for the surrounding parens, and then find all the combinations of substitutions that eventually lead to exactly six more on the inside. Clearly we have reduced a problem of size k to multiple problems of size less than k-1, and so this is amenable to a recursive solution. Right?

Next time: Am I right, or have I made a logical error somewhere above?

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

• When verifying a parser, isn't it just as useful to identify that it rejects invalid programs as it is to accept valid ones? While aaaaaaa},c is a pretty easy case to verify as wrong, most code run against a parser has at least a few subtle errors in it (especially the compiler that runs as you code). Generating a valid AST from invalid code seems like a bad thing.

Indeed. The point of this exercise is to generate test cases for parts of the compiler, and that includes testing error cases. However, since there are trillions upon trillions of short erroneous programs most of which are uninteresting, I would want to concentrate on testing the parser against common situations. The most common error scenario is that the IntelliSense parser is running on code as it is being edited; I would therefore generate test cases by generating a bunch of legal programs and then deleting or modifying interesting tokens; in particular, deleting commas, close curlies, and so on, would probably be quite useful. The logic for recovering from missing end tokens is one place where the parser author is likely to have made incorrect assumptions.

Alternatively, there's no reason why I couldn't come up with a grammar for generating a subset of *incorrect* programs to see how the parser handles all members of that language. A device which generates all members of a language need not be restricted to languages that are subsets of legal C#.

Incidentally, the most likely scenario for a program containing random garbage is someone accidentally loading an executable into the editor thinking it is source code. The C# compiler already has heuristics to detect that situation and reject it before the parser runs. - Eric

• "Incidentally, the most likely scenario for a program containing random garbage is someone accidentally loading an executable into the editor thinking it is source code. The C# compiler already has heuristics to detect that situation and reject it before the parser runs. - Eric "

That reminds me of the COBOL compiler I used in University. I believe it was one of the IBM 360 COBOL compilers. If you made the mistake of either incorrectly punching "identification division." or leaving the card out of your source deck completely, your output rivaled that of some core dumps. The compiler would read the first card of your source deck and if it didn't read "identification division.", beginning in the proper card column, it generated not one, but multiple error messages for each and every card in your source deck. You would have thought that if the first card wasn't what it expected, it would have simply generated a single error and flushed the rest of your source deck, but it didn't.

• Eric wrote:

"Incidentally, the most likely scenario for a program containing random garbage is someone accidentally loading an executable into the editor thinking it is source code. The C# compiler already has heuristics to detect that situation and reject it before the parser runs."

Interesting. I hope the heuristic is not just confined to the just the C# compiler. I would think that would be an obvious thing to block all across the IDE, regardless of the language. It should also be blocked across all of the command-line compilers, or at least the command line compiler should only allow certain file extensions through. Although nothing stops someone from taking a random file and changing the extension and trying to ram it through.

Eric wrote:

"However, since there are trillions upon trillions of short erroneous programs most of which are uninteresting, I would want to concentrate on testing the parser against common situations."

And I'd imagine there are a large number of erroneous linking scenarios! But that's another discussion. :-)

Eric wrote:

"Alternatively, there's no reason why I couldn't come up with a grammar for generating a subset of *incorrect* programs to see how the parser handles all members of that language."

And then create a repository (or a database) of the erroneous test programs it generates!

Or, is there a way to take existing erroneous programs that are out there, try to reverse engineer bad grammars out of those, and then look for patterns among all the bad grammars in order to come up with a good, universal "testing grammar"?

• "and so on. Well, all binary trees strings with, say, eight terminals are:

( {all strings with 0 terminals} {all strings with 6 terminals} )

( {all strings with 1 terminal} {all strings with 5 terminals} )"

Did you mean to say *seven* terminals?

Great mini-series.  I'm enjoying the refresher on grammars.

• @Marty: No, he meant eight terminals. 2 parentheses + 0 terminals + 6 terminals = 2 parentheses + 1 terminal + 5 terminals = 8 terminals.

• Eric: "it seems likely that if an algorithm works for all trees of size five or less, then it works for all trees period."

Hume wouldn't appreciate that.

When applying this reasoning to trees, you're generalizing from the infinite. However, with your grammar, there really is only a finite number of programs that can be generated since we have a finite number of identifiers. So maybe your reasoning holds for all programs (with your particular grammar). Perhaps I'm missing something though...

I'm really enjoying this series, thanks for taking the time to do this.

• "it seems likely that if an algorithm works for all trees of size five or less, then it works for all trees period."

If it were me, I wouldn't stop there.  While certainly having tests for all trees five or less would be nice, a set of deep probabilistic tests would also be useful, for combinatorial type problems.

In addition to the "Every tree there is", enumerating them in sequence, an algorithm, "An arbitrary tree of size N, generated at random" can also useful.  Then you test your code against a few hundred, thousand, or million random samples on trees of various size.  Adapting the technique you've outlined already, it should be simple:

Tree CreateRandomTreeOfSize( int n, Random r )

{

if( n < 1 ) // A tree of size 0 represented by a null tree.

{

return null;

}

leftSize = r.Next( n );

rightSize = n - leftSize - 1;

return new Tree

{

Left = CreateRandomTreeOfSize( leftSize ),

Right = CreateRandomTreeOfSize( rightSize ),

}

}

The further techniques for generating code sequences also apply here.

• Ever since you started this series I've been wondering how you would use a test program generator to verify a compiler. Besides making sure the compiler doesn't crash (of course this includes assertion failures), how would you verify that it produces the correct output?

In the case of testing the parser you could generate the parse trees at the same time you generate the textual input and just verify that they match. If you're testing the AST builder you could generate all possible ASTs, then generate C# code from each one and make sure the AST from each snippet of C# code matches the expected AST.

Once you start trying to verify that the compiler turns the correct C# code into the correct assembly, the only way to do that is to compare its output to another compiler that you hope doesn't have the same bugs.

Indeed, there are interesting challenges here even once we have a program generator. There are several ways we could use a program generator to help with testing.

My primary aims that actually motivated building this device were (1) looking for crashing bugs in the parser, and (2) looking for infinite-loop bugs in the base class analyzer. For both we can simply throw a lot of cases at the subsystem in question and see whether it crashes or hangs.

We could also generate a large corpus of test programs and hand-pick some interesting ones to analyze "by hand", and then add checkin tests that make sure that the compiler's output matches our human-analyzed beliefs about the output.

And we could generate a corpus of test programs, filter out the ones that are not semantically legal, and then ensure that the new compiler and the old compiler produce "the same" IL. (Where "the same" is in scare quotes because we do not ensure bit-for-bit identity even across two compilations with the same compiler; we have a "canonicalizer" which turns IL into a canonical form that can be compared across compilations to ensure we haven't accidentally changed codegen.)

-- Eric

• Eric: "it seems likely that if an algorithm works for all trees of size five or less, then it works for all trees period."

Patrick Dewane: "Hume wouldn't appreciate that."

He didn't deny the importance of induction. In fact he admitted we would be permanently ignorant, trapped in the present moment, without it. He merely pointed out that induction itself could not be justified by deduction.

So, based on several examples from Hume's writings, I *induce* that he would have appreciated Eric's induction, although I can't *deduce* that he would. :)

Page 1 of 1 (9 items)