In this post, I further enhance the recursive descent parser that will parse the simple grammar that I presented previously in this series. I’ll examine some more classes that represent symbols in that grammar. The classes I present in this post further show how it’s possible to write LINQ code that closely parallels the grammar.

Blog TOCThis post is one in a series on using LINQ to write a recursive-descent parser for SpreadsheetML formulas. You can find the complete list of posts here.

In this version of the parser, I’ll enhance the parser to a certain extent, but it still will not be a full implementation of the simple grammar that I defined. In particular, I’m not going to implement infix operators until the next post. Infix operators are a bit more involved, particularly because we need to implement operator precedence. It is not a lot of code, but I use a few useful little LINQ tricks that deserve explanations.

The NospaceExpression Class

In the last version of the parser, I implemented a ‘dummy’ NospaceExpression class. For development and testing purposes, any sequence of symbols passed to that implementation of the NospaceExpression.Produce method were considered to make up a valid nospace-expression symbol.

In this post, I’ll simplify the grammar for the NospaceExpression symbol to the following, eliminating infix operators:

// nospace-expression = open-parenthesis expression close-parenthesis if (symbols.First() isOpenParenthesis && symbols.Last() isCloseParenthesis) { Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1)); if (e != null) returnnewNospaceExpression(newOpenParenthesis(), e, newCloseParenthesis()); }

// numerical-constant NumericalConstant n = NumericalConstant.Produce(symbols); if (n != null) returnnewNospaceExpression(n);

// prefix-operator expression PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault()); if (p != null) { Expression e = Expression.Produce(symbols.Skip(1)); if (e != null) returnnewNospaceExpression(p, e); }

returnnull; }

public NospaceExpression(paramsObject[] symbols) : base(symbols) { } }

Yellow code: The nospace-expression symbol is made up of three possible sets of constituent tokens. It could be made up of an open parenthesis, an expression, and a close parenthesis.

Green code: It could be made up of a numerical constant.

Cyan code: It could be made up of a prefix operator followed by a NospaceExpression.

If it is none of those three, then the Produce method returns null.

You can see that the C# / LINQ code to process the constituent symbols and produce a symbol has a direct correlation to the grammar.

The NumericalConstant Class

The next class I’ll examine in this post is the NumericalConstant class. One interesting characteristic of the grammar associated with this class is that it contains an optional symbol in the grammar, indicated by the square brackets:

This is described in the post on the Augmented Backus-Naur Form Grammar, which discusses the various notations that we need to pay attention to.

The NumericalConstant.Produce method first attempts to produce a SignificandPart symbol using the complete list of constituent symbols that was passed to it. If successful, the method can produce a NumericalConstant object with one constituent symbol, the SignificandPart object.

If SignificandPart fails to produce a symbol, then this method asks the NegSign.Produce method to produce a NegSign object, passing the first symbol in the collection to it. If that was successful, then the method asks the SignificandPart class to produce a symbol using a collection of all symbols except the first one (using the Skip extension method).

SignificandPart s = SignificandPart.Produce(symbols); if (s != null) returnnewNumericalConstant(s); NegSign n = NegSign.Produce(symbols.First()); if (n != null) { SignificandPart s2 = SignificandPart.Produce(symbols.Skip(1)); if (s2 != null) returnnewNumericalConstant(n, s2); } returnnull; } public NumericalConstant(paramsObject[] symbols) : base(symbols) { } }

One key point about the NegSign.Produce method: it will be asked to produce a symbol only for a single symbol, which will be a Minus symbol, so for simplicity, its Produce method takes a singleton Symbol instead of a collection. Here is the NegSign class:

publicclassNegSign : Symbol { publicstaticNegSign Produce(Symbol symbol) { // neg-sign = minus

if (symbol isMinus) returnnewNegSign(symbol); returnnull; } public NegSign(paramsObject[] symbols) : base(symbols) { } }

The SignificandPart Class

As a last example in this post, let’s take a look at the significand-part symbol, which is comprised of one of:

A fractional part

A whole-number-part followed by an optional fractional-part

To process this grammar, the first thing to do is to ask the FractionalPart class to produce a symbol, passing all of the symbols that were passed to SignificandPart.Produce. If the FractionalPart class can produce that symbol, then the method can return a SignificandPart object.

If the FractionalPart class can’t produce a symbol, then SignificandPart.Produce needs to ask the WholeNumberPart class to produce a symbol. However, if there is a fractional part following the WholeNumberPart, WholeNumberPart.Produce will not consume all of the constituent symbols. Therefore, WholeNumberPart.Produce has an additional out argument, which will contain any unprocessed symbols. If the unprocessed symbols collection is empty, then there is no fractional part, and SignificandPart can produce its symbol (which has no fractional part).

However, if the unprocessed symbols collection is not empty, then SignificandPart.Produce will ask FractionalPart.Produce to produce a FractionalPart with the unprocessed symbols, and if it does, then SignificandPart.Produce can produce its symbol.

This sounds more complicated than it is. Here is the implementation of the SignificandPart class:

FractionalPart f; f = FractionalPart.Produce(symbols); if (f != null) returnnewSignificandPart(f); IEnumerable<Symbol> s = null; WholeNumberPart w = WholeNumberPart.Produce(symbols, out s); if (w != null) { if (!s.Any()) returnnewSignificandPart(w); f = FractionalPart.Produce(s); if (f != null) returnnewSignificandPart(w, f); } returnnull; } public SignificandPart(paramsObject[] symbols) : base(symbols) { } }

And here is the WholeNumberPart class, which potentially will not consume all of the symbols that are passed to its Produce method.

IEnumerable<Symbol> unprocessedSymbols = null; DigitSequence d = DigitSequence.Produce(symbols, out unprocessedSymbols); if (d != null) { symbolsToProcess = unprocessedSymbols; returnnewWholeNumberPart(d); } symbolsToProcess = null; returnnull; } public WholeNumberPart(paramsObject[] symbols) : base(symbols) { } }

It, in turn, is comprised of a DigitSequence symbol, which is interesting in that it provides another example of how the LINQ code parallels the grammar. A DigitSequence is comprised of one or more DecimalDigit symbols (which are terminals). The LINQ code essentially says: query for the first n DecimalDigit symbols in the sequence. If there are any, then return a DigitSequence symbol, and set the out parameter to the unprocessed symbols. If there aren’t any, then return null.