Every Program There Is, Part Three

Every Program There Is, Part Three

Rate This
  • Comments 4

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

Suppose we want to write a grammar for a simplified C# class declaration. Let’s say that there are one-character identifiers, a class can be public or internal, and that there are no nested or generic types or other members. How about this?

DECL:     MODIFIER class ID : ID { }
MODIFIER: public | internal
ID:       a | b | c | d

so public class d : b { } is legal. But we forgot that the modifier should be optional:

DECL:           WITHMODIFIER | WITHOUTMODIFIER
WITHMODIFIER:   MODIFIER WITHOUTMODIFIER
WITHOUTMODIFER: class ID : ID { }
MODIFIER:       public | internal
ID:             a | b | c | d

And we forgot that the base class is optional…

DECL:            WITHMODIFIER | WITHOUTMODIFIER
WITHMODIFIER:    MODIFIER WITHOUTMODIFIER
WITHOUTMODIFIER: MAYBEBASE { }
MAYBEBASE :      WITHOUTBASE : ID | WITHOUTBASE
WITHOUTBASE:     class ID
MODIFIER:        public | internal
ID:              a | b | c | d

It’s doable, but as you can see, this is getting to be a bit of a mess. It is a lot easier if we can say that “nothing at all” is a legal substitution. Since that’s hard to write, I’m going to make a new rule that NIL is a special terminal that is a nice way of writing the empty string. Now our grammar can be much easier to read:

DECL:     MODIFIER class ID BASE { }
MODIFIER: NIL | public | internal
BASE:     NIL | : ID
ID:       a | b | c | d

This addition has interesting consequences. Previously we were in the situation where every application of a rule made the "current string" no shorter in terms of the number of terminals plus the number of nonterminals. Now we have a situation where a substitution can make the resulting string have fewer nonterminals without adding any terminals. But I'm getting ahead of myself somewhat.

Here I've characterized "NIL" as a terminal which is a special way of writing the empty string. Another way of characterizing NIL is as a nonterminal; NIL as a nonterminal is a special nonterminal such that the only substitution is the empty string. For the rest of this series I'll treat it as a convenient way of writing the empty string terminal.

Next time: techniques for generating all members of a language

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

  • Ya BNF! any chance of getting the C# AST available for developers to use? it would make the plugins for VS2010 so much more powerful!

    We get that feedback a lot. We hope to "open up" the compiler and expose it as a "code analysis service" someday. However, this is still in the design stages and you should not expect anything anytime soon. -- Eric

  • This has been a pretty fascinating series. I have to admit this has piqued my interest about compiler theory in general. I had an idea recently for doing some JavaScript based intellisense for an open source project, but I got bogged down in the details.

    Do you have any good resources you could point me to just to get my feet wet? I realize that intellisense and compiler theory are different animals, but it looks as though they have a lot in common.

    Glad you're enjoying it. Yes, they have lots in common; IntelliSense requires us to build a compiler that can correctly analyze incorrect source code faster than you can type; that's a much more difficult problem than writing a "batch" compiler that just reads files off disks.

    JScript is a particularly tricky language to write an IntelliSense engine for because of its highly dynamic nature and lack of compile-time hints. When I last wrote a simple JScript IntelliSense engine it was pretty weak. I looked at assignments to local variables and worked out the types of the expressions, and I identified usages of the HTML DOM objects like "window" that I had type information for. There are more sophisticated things you can do with JScript, like building yourself a no-external-side-effects version of the JScript interpreter and then using it to actually *run* the program you are editing as you are editing it; that way you can do live inspection of the state of the dynamic objects.

    I don't know of any resources specifically about building IntelliSense-style analyzers.

    -- Eric

  • Interesting...I'm learning more in this series of posts than I've learned from some compiler books.

  • Eric,

    Thanks for the reply. I guess I hadn't really made myself that clear. I wasn't planning to build intellisense for JavaScript, rather I was looking at writing an intellisense engine *in* JavaScript. Sounds crazy I know, but the Grammars I would be using are pretty fixed. Not sure if you are familiar with FitNesse or not, but I wanted to create a plugin that would make creating tests a bit more user friendly.

    I figured out quickly this is a non-trivial problem to solve, and decided I needed to understand the problem domain a little better. All I really found was very heavy academic papers on compiler theory, and not enough time to try and digest them. I am a hands on person, and was looking for some code to play with.

    I guess I was just wondering if you knew of a really good resource for programmers wanting to learn more about this stuff? Actual code or tutorials would be awesome!

    I don't know of any offhand.

    What I would suggest doing is dividing the problem into smaller, more basic tasks that you can make progress on. Before you try writing a full-on IntelliSense-style engine, try writing a colourizer. Start by writing an error-recovering lexer and then you can use its output to classify each character in the file as being a keyword, an identifier, a number, a string, whatever. Once you have lexical analysis solid, then move on to syntactic analysis, for features like "hide this method". Once you have syntactic analysis solid, then move on to semantic analysis for IntelliSense. -- Eric

Page 1 of 1 (4 items)