Blog - Title

Building a Simple Recursive Descent Parser (Completed Simple Parser)

Building a Simple Recursive Descent Parser (Completed Simple Parser)

  • Comments 7

In this post, I complete the recursive descent parser that will parse the simple grammar that I presented previously in this series.  In this post, I’ll examine in detail the NospaceExpression class, which needs to implement operator precedence.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Note:  I have to apologize for the delay in writing this post.  First summer vacation took some time, and then I became involved in a project of writing two papers – one that summarizes SharePoint application development, and another that summarizes Office Client application development.  I’ve been literally obsessing about every possible way that you as a developer can extend SharePoint Server 2010, and every way that you can extend the Office 2010 client applications.  The SharePoint paper is nearing publication.  The Office Client paper will be along in a bit.  While obsessing, it has been difficult for me to mentally switch tracks back to this series.  But here we go again…

This post is one in a series on using LINQ to write a recursive-descent parser for SpreadsheetML formulas.  You can find the complete list of posts here.

The NospaceExpression Class

Here is the entire implementation of the NospaceExpression class.  I explained all of the code that has a gray background in the previous post, Building a Simple Recursive Descent Parser (continued).

public static NospaceExpression Produce(IEnumerable<Symbol> symbols)
{
    // nospace-expression = open-parenthesis expression close-parenthesis
    //         / numerical-constant
    //         / prefix-operator expression
    //         / expression infix-operator expression
 
    if (!symbols.Any())
        return null;
 
    if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
    {
        Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
        if (e != null)
            return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
    }
 
    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
    {
        SymbolWithIndex = v1,
        Depth = v2,
    });
    var operatorList = z2
        .Where(x => x.Depth == 0 &&
            x.SymbolWithIndex.Index != 0 &&
            InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
        .ToList();
    if (operatorList.Any())
    {
        int minPrecedence = operatorList
            .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
        var op = operatorList
            .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
        if (op != null)
        {
            var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
            Expression e1 = Expression.Produce(expressionTokenList1);
            if (e1 == null)
                throw new ParserException("Invalid expression");
            var expressionTokenList2 = symbols
                .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
            Expression e2 = Expression.Produce(expressionTokenList2);
            if (e2 == null)
                throw new ParserException("Invalid expression");
            InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
            return new NospaceExpression(e1, io, e2);
        }
    }
 
    NumericalConstant n = NumericalConstant.Produce(symbols);
    if (n != null)
        return new NospaceExpression(n);
 
    PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault());
    if (p != null)
    {
        Expression e = Expression.Produce(symbols.Skip(1));
        if (e != null)
            return new NospaceExpression(p, e);
    }
 
    return null;
}
 

Let’s work through the new code.  Let’s consider how the new code behaves when parsing the following formula:

"1+((2+3)*4)^5"
 

When determining if an expression has an infix operator, we specifically need to ignore operators that are in parentheses.  Those operators will have their chance to be recognized when NospaceExpression.Produce sees that an expression starts with an open parenthesis and ends with a close parenthesis, as processed by this code:

if (!symbols.Any())
    return null;
 
if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
{
    Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
    if (e != null)
        return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
}
 

So we need to ignore operators that are inside parentheses.  The easiest way to determine if an operator is inside of parentheses is to use the Rollup extension method, as follows:

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
 

This use of the Rollup extension method returns a new collection of integers that has the same number of elements in it as the source collection.  The integer is the depth of parentheses at that point.  In the following snippet, I add a bit of code to print the items in the collection, and then exit:

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
foreach (var item in z)
    Console.WriteLine(item);
Environment.Exit(0);
 
When I run this example for the test formula "1+((2+3)*4)^5", I see:
0
0
1
2
2
2
2
1
1
1
0
0
0
 

When looking for an infix operator, I can ignore all operators where the depth is not zero.

Later on, I’m going to need the index of the symbol of the infix operator that we find.  The next projection uses symbols as its source, and projects an anonymous type that includes the symbol and its index:

var symbolsWithIndex = symbols.Select((s, i) => new
{
    Symbol = s,
    Index = i,
});
foreach (var item in symbolsWithIndex)
    Console.WriteLine(item);
Environment.Exit(0);
 

When I run this for the test formula, I see:

{ Symbol = 1, Index = 0 }
{ Symbol = +, Index = 1 }
{ Symbol = (, Index = 2 }
{ Symbol = (, Index = 3 }
{ Symbol = 2, Index = 4 }
{ Symbol = +, Index = 5 }
{ Symbol = 3, Index = 6 }
{ Symbol = ), Index = 7 }
{ Symbol = *, Index = 8 }
{ Symbol = 4, Index = 9 }
{ Symbol = ), Index = 10 }
{ Symbol = ^, Index = 11 }
{ Symbol = 5, Index = 12 }
 

And next, use the Zip extension method to zip together the last two queries to create a list of items that contains:

·         The symbol

·         Its index

·         Its parenthesis depth

var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
    Symbol = s,
    Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
    SymbolWithIndex = v1,
    Depth = v2,
});
foreach (var item in z2)
    Console.WriteLine(item);
Environment.Exit(0);
 

This produces:

{ SymbolWithIndex = { Symbol = 1, Index = 0 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = (, Index = 2 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = (, Index = 3 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = 2, Index = 4 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = +, Index = 5 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = 3, Index = 6 }, Depth = 2 }
{ SymbolWithIndex = { Symbol = ), Index = 7 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = *, Index = 8 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = 4, Index = 9 }, Depth = 1 }
{ SymbolWithIndex = { Symbol = ), Index = 10 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = 5, Index = 12 }, Depth = 0 }
 

Next, we can look for any infix operators where the Depth == 0.  In addition, we filter out operators that are found in the very first position, because that will not be an infix operator, but instead a prefix operator.

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
    Symbol = s,
    Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
    SymbolWithIndex = v1,
    Depth = v2,
});
var operatorList = z2
    .Where(x => x.Depth == 0 &&
        x.SymbolWithIndex.Index != 0 &&
        InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
    .ToList();
foreach (var item in operatorList)
    Console.WriteLine(item);
Environment.Exit(0);
 

This produces a list of the two possible operators that are part of an expression that uses an infix operator:

{ SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
{ SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }
 

Next, of the operators that could possibly be the infix operator for this expression, we find the minimum precedence of any operators.  When implementing a top-down parser, we process operators with the least precedence before processing operators of higher precedence.  When looking at the expression tree, operators with the least precedence are nearer to the root of the tree.  Operators with the highest precedence will be closer to the leaves, so that they will be evaluated before the operators with lower precedence.  If this isn’t clear, I think that it will be when you see the entire parse tree.  In this particular case, we need to find the + operator, not the ^ operator, as + has lower precedence.

To enable processing of precedence of operators, I defined a small dictionary of operators and their precedence:

public static Dictionary<string, int> OperatorPrecedence = new Dictionary<string, int>()
{
    { "^", 3 },
    { "*", 2 },
    { "/", 2 },
    { "+", 1 },
    { "-", 1 },
};
 

In the following listing, the highlighted line finds the minimum precedence of the operators that are in play.

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
    Symbol = s,
    Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
    SymbolWithIndex = v1,
    Depth = v2,
});
var operatorList = z2
    .Where(x => x.Depth == 0 &&
        x.SymbolWithIndex.Index != 0 &&
        InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
    .ToList();
if (operatorList.Any())
{
    int minPrecedence = operatorList
        .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
    var op = operatorList
        .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
    if (op != null)
    {
        var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
        Expression e1 = Expression.Produce(expressionTokenList1);
        if (e1 == null)
            throw new ParserException("Invalid expression");
        var expressionTokenList2 = symbols
            .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
        Expression e2 = Expression.Produce(expressionTokenList2);
        if (e2 == null)
            throw new ParserException("Invalid expression");
        InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
        return new NospaceExpression(e1, io, e2);
    }
}
 

And finally, we find the infix operator:

// expression, infix-operator, expression
var z = symbols.Rollup(0, (t, d) =>
{
    if (t is OpenParenthesis)
        return d + 1;
    if (t is CloseParenthesis)
        return d - 1;
    return d;
});
var symbolsWithIndex = symbols.Select((s, i) => new
{
    Symbol = s,
    Index = i,
});
var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
{
    SymbolWithIndex = v1,
    Depth = v2,
});
var operatorList = z2
    .Where(x => x.Depth == 0 &&
        x.SymbolWithIndex.Index != 0 &&
        InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
    .ToList();
if (operatorList.Any())
{
    int minPrecedence = operatorList
        .Select(o2 =>OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
    var op = operatorList
        .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
    if (op != null)
    {
        var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
        Expression e1 = Expression.Produce(expressionTokenList1);
        if (e1 == null)
            throw new ParserException("Invalid expression");
        var expressionTokenList2 = symbols
            .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
        Expression e2 = Expression.Produce(expressionTokenList2);
        if (e2 == null)
            throw new ParserException("Invalid expression");
        InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
        return new NospaceExpression(e1, io, e2);
    }
}
 

If we found an operator, then we can new-up (highlighted in green) an instance of NoSpaceExpression, passing the symbols before and after the infix operator to Expression.Produce (highlighted in cyan).

After parsing the sample expression, “1+((2+3)*4)^5”, the parser produces the following parse tree:

Formula >1+((2+3)*4)^5<
  Expression >1+((2+3)*4)^5<
    NospaceExpression >1+((2+3)*4)^5<
      Expression >1<
        NospaceExpression >1<
          NumericalConstant >1<
            SignificandPart >1<
              WholeNumberPart >1<
                DigitSequence >1<
                  DecimalDigit >1<
      InfixOperator >+<      <= the + operator has lower precedence, so is nearer to the root of the tree, and will be evaluated last
        Plus >+<
      Expression >((2+3)*4)^5<
        NospaceExpression >((2+3)*4)^5<
          Expression >((2+3)*4)<
            NospaceExpression >((2+3)*4)<
              OpenParenthesis >(<
              Expression >(2+3)*4<
                NospaceExpression >(2+3)*4<
                  Expression >(2+3)<
                    NospaceExpression >(2+3)<
                      OpenParenthesis >(<
                      Expression >2+3<
                        NospaceExpression >2+3<
                          Expression >2<
                            NospaceExpression >2<
                              NumericalConstant >2<
                                SignificandPart >2<
                                  WholeNumberPart >2<
                                    DigitSequence >2<
                                      DecimalDigit >2<
                          InfixOperator >+<
                            Plus >+<
                          Expression >3<
                            NospaceExpression >3<
                              NumericalConstant >3<
                                SignificandPart >3<
                                  WholeNumberPart >3<
                                    DigitSequence >3<
                                      DecimalDigit >3<
                      CloseParenthesis >)<
                  InfixOperator >*<
                    Asterisk >*<
                  Expression >4<
                    NospaceExpression >4<
                      NumericalConstant >4<
                        SignificandPart >4<
                          WholeNumberPart >4<
                            DigitSequence >4<
                              DecimalDigit >4<
              CloseParenthesis >)<
          InfixOperator >^<   <= the ^ operator has higher precedence, so is closer to leaves of the tree, so will be evaluated first
            Caret >^<
          Expression >5<
            NospaceExpression >5<
              NumericalConstant >5<
                SignificandPart >5<
                  WholeNumberPart >5<
                    DigitSequence >5<
                      DecimalDigit >5<
 

Following is the complete listing of the simple parser.  While I haven’t discussed every class in the parser, I’ve introduced every technique that I’ve used in writing the Produce methods for each class.  To run this example, you can simply copy/paste this example into Visual Studio and run it.  There are no dependencies.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace SimpleParser
{
    public abstract class Symbol
    {
        public List<Symbol> ConstituentSymbols { get; set; }
        public override string ToString()
        {
            string s = ConstituentSymbols.Select(ct => ct.ToString()).StringConcatenate();
            return s;
        }
        public Symbol(params Object[] symbols)
        {
            List<Symbol> ls = new List<Symbol>();
            foreach (var item in symbols)
            {
                if (item is Symbol)
                    ls.Add((Symbol)item);
                else if (item is IEnumerable<Symbol>)
                    foreach (var item2 in (IEnumerable<Symbol>)item)
                        ls.Add(item2);
                else
                    // If this error is thrown, the parser is coded incorrectly.
                    throw new ParserException("Internal error");
            }
            ConstituentSymbols = ls;
        }
        public Symbol() { }
    }
 
    public class Formula : Symbol
    {
        public static Formula Produce(IEnumerable<Symbol> symbols)
        {
            // formula = expression
 
            Expression e = Expression.Produce(symbols);
            return e == null ? null : new Formula(e);
        }
        public Formula(params Object[] symbols) : base(symbols) { }
    }
 
    public class Expression : Symbol
    {
        public static Expression Produce(IEnumerable<Symbol> symbols)
        {
            // expression = *whitespace nospace-expression *whitespace
 
            int whiteSpaceBefore = symbols.TakeWhile(s => s is WhiteSpace).Count();
            int whiteSpaceAfter = symbols.Reverse().TakeWhile(s => s is WhiteSpace).Count();
            IEnumerable<Symbol> noSpaceSymbolList = symbols
                .Skip(whiteSpaceBefore)
                .SkipLast(whiteSpaceAfter)
                .ToList();
            NospaceExpression n = NospaceExpression.Produce(noSpaceSymbolList);
            if (n != null)
                return new Expression(
                    Enumerable.Repeat(new WhiteSpace(), whiteSpaceBefore),
                    n,
                    Enumerable.Repeat(new WhiteSpace(), whiteSpaceAfter));
            return null;
        }
 
        public Expression (params Object[] symbols) : base(symbols) { }
    }
 
    public class NospaceExpression : Symbol
    {
        public static Dictionary<string, int> OperatorPrecedence = new Dictionary<string, int>()
        {
            { "^", 3 },
            { "*", 2 },
            { "/", 2 },
            { "+", 1 },
            { "-", 1 },
        };
 
        public static NospaceExpression Produce(IEnumerable<Symbol> symbols)
        {
            // nospace-expression = open-parenthesis expression close-parenthesis
            //         / numerical-constant
            //         / prefix-operator expression
            //         / expression infix-operator expression
 
            if (!symbols.Any())
                return null;
 
            if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
            {
                Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
                if (e != null)
                    return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
            }
 
            // expression, infix-operator, expression
            var z = symbols.Rollup(0, (t, d) =>
            {
                if (t is OpenParenthesis)
                    return d + 1;
                if (t is CloseParenthesis)
                    return d - 1;
                return d;
            });
            var symbolsWithIndex = symbols.Select((s, i) => new
            {
                Symbol = s,
                Index = i,
            });
            var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
            {
                SymbolWithIndex = v1,
                Depth = v2,
            });
            var operatorList = z2
                .Where(x => x.Depth == 0 &&
                    x.SymbolWithIndex.Index != 0 &&
                    InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
                .ToList();
            if (operatorList.Any())
            {
                int minPrecedence = operatorList
                    .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
                var op = operatorList
                    .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
                if (op != null)
                {
                    var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
                    Expression e1 = Expression.Produce(expressionTokenList1);
                    if (e1 == null)
                        throw new ParserException("Invalid expression");
                    var expressionTokenList2 = symbols
                        .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
                    Expression e2 = Expression.Produce(expressionTokenList2);
                    if (e2 == null)
                        throw new ParserException("Invalid expression");
                    InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
                    return new NospaceExpression(e1, io, e2);
                }
            }
 
            NumericalConstant n = NumericalConstant.Produce(symbols);
            if (n != null)
                return new NospaceExpression(n);
 
            PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault());
            if (p != null)
            {
                Expression e = Expression.Produce(symbols.Skip(1));
                if (e != null)
                    return new NospaceExpression(p, e);
            }
 
            return null;
        }
 
        public NospaceExpression(params Object[] symbols) : base(symbols) { }
    }
 
    public class NumericalConstant : Symbol
    {
        public static NumericalConstant Produce(IEnumerable<Symbol> symbols)
        {
            // numerical-constant = [neg-sign] significand-part
 
            SignificandPart s = SignificandPart.Produce(symbols);
            if (s != null)
                return new NumericalConstant(s);
            NegSign n = NegSign.Produce(symbols.First());
            if (n != null)
            {
                SignificandPart s2 = SignificandPart.Produce(symbols.Skip(1));
                if (s2 != null)
                    return new NumericalConstant(n, s2);
            }
            return null;
        }
        public NumericalConstant(params Object[] symbols) : base(symbols) { }
    }
 
    public class SignificandPart : Symbol
    {
        public static SignificandPart Produce(IEnumerable<Symbol> symbols)
        {
            // significand-part = whole-number-part [fractional-part] / fractional-part
 
            FractionalPart f;
            f = FractionalPart.Produce(symbols);
            if (f != null)
                return new SignificandPart(f);
            IEnumerable<Symbol> s = null;
            WholeNumberPart w = WholeNumberPart.Produce(symbols, out s);
            if (w != null)
            {
                if (!s.Any())
                    return new SignificandPart(w);
                f = FractionalPart.Produce(s);
                if (f != null)
                    return new SignificandPart(w, f);
            }
            return null;
        }
        public SignificandPart(params Object[] symbols) : base(symbols) { }
    }
 
    public class WholeNumberPart : Symbol
    {
        public static WholeNumberPart Produce(IEnumerable<Symbol> symbols,
            out IEnumerable<Symbol> symbolsToProcess)
        {
            // whole-number-part = digit-sequence
 
            IEnumerable<Symbol> s = null;
            DigitSequence d = DigitSequence.Produce(symbols, out s);
            if (d != null)
            {
                symbolsToProcess = s;
                return new WholeNumberPart(d);
            }
            symbolsToProcess = null;
            return null;
        }
        public WholeNumberPart(params Object[] symbols) : base(symbols) { }
    }
 
    public class FractionalPart : Symbol
    {
        public static FractionalPart Produce(IEnumerable<Symbol> symbols)
        {
            // fractional-part = full-stop digit-sequence
 
            if (!symbols.Any())
                return null;
            if (symbols.First() is FullStop)
            {
                IEnumerable<Symbol> s = null;
                DigitSequence d = DigitSequence.Produce(symbols.Skip(1), out s);
                if (d == null || s.Any())
                    return null;
                return new FractionalPart(new FullStop(), d);
            }
            return null;
        }
        public FractionalPart(params Object[] symbols) : base(symbols) { }
    }
 
    public class DigitSequence : Symbol
    {
        public static DigitSequence Produce(IEnumerable<Symbol> symbols,
            out IEnumerable<Symbol> symbolsToProcess)
        {
            // digit-sequence = 1*decimal-digit
 
            IEnumerable<Symbol> digits = symbols.TakeWhile(s => s is DecimalDigit);
            if (digits.Any())
            {
                symbolsToProcess = symbols.Skip(digits.Count());
                return new DigitSequence(digits);
            }
            symbolsToProcess = null;
            return null;
        }
        public DigitSequence(params Object[] symbols) : base(symbols) { }
    }
 
    public class NegSign : Symbol
    {
        public static NegSign Produce(Symbol symbol)
        {
            // neg-sign = minus
 
            if (symbol is Minus)
                return new NegSign(symbol);
            return null;
        }
        public NegSign(params Object[] symbols) : base(symbols) { }
    }
 
    public class PrefixOperator : Symbol
    {
        public static PrefixOperator Produce(Symbol symbol)
        {
            // prefix-operator = plus / minus
 
            if (symbol is Plus || symbol is Minus)
                return new PrefixOperator(symbol);
            return null;
        }
        public PrefixOperator(params Object[] symbols) : base(symbols) { }
    }
 
    public class InfixOperator : Symbol
    {
        public static InfixOperator Produce(Symbol symbol)
        {
            // infix-operator = caret / asterisk / forward-slash / plus / minus
 
            if (symbol is Plus || symbol is Minus || symbol is Asterisk || symbol is ForwardSlash
                || symbol is Caret)
                return new InfixOperator(symbol);
            return null;
        }
        public InfixOperator(params Object[] symbols) : base(symbols) { }
    }
 
    public class DecimalDigit : Symbol
    {
        private string CharacterValue;
        public override string ToString() { return CharacterValue; }
        public DecimalDigit(char c) { CharacterValue = c.ToString(); }
    }
 
    public class WhiteSpace : Symbol
    {
        public override string ToString() { return " "; }
        public WhiteSpace() { }
    }
 
    public class Plus : Symbol
    {
        public override string ToString() { return "+"; }
        public Plus() { }
    }
 
    public class Minus : Symbol
    {
        public override string ToString() { return "-"; }
        public Minus() { }
    }
 
    public class Asterisk : Symbol
    {
        public override string ToString() { return "*"; }
        public Asterisk() { }
    }
 
    public class ForwardSlash : Symbol
    {
        public override string ToString() { return "/"; }
        public ForwardSlash() { }
    }
 
    public class Caret : Symbol
    {
        public override string ToString() { return "^"; }
        public Caret() { }
    }
 
    public class FullStop : Symbol
    {
        public override string ToString() { return "."; }
        public FullStop() { }
    }
 
    public class OpenParenthesis : Symbol
    {
        public override string ToString() { return "("; }
        public OpenParenthesis() { }
    }
 
    public class CloseParenthesis : Symbol
    {
        public override string ToString() { return ")"; }
        public CloseParenthesis() { }
    }
 
    public class SimpleFormulaParser
    {
        public static Symbol ParseFormula(string s)
        {
            IEnumerable<Symbol> symbols = s.Select(c =>
            {
                switch (c)
                {
                    case '0':
                    case '1':
                    case '2':
                    case '3':
                    case '4':
                    case '5':
                    case '6':
                    case '7':
                    case '8':
                    case '9':
                        return new DecimalDigit(c);
                    case ' ':
                        return new WhiteSpace();
                    case '+':
                        return new Plus();
                    case '-':
                        return new Minus();
                    case '*':
                        return new Asterisk();
                    case '/':
                        return new ForwardSlash();
                    case '^':
                        return new Caret();
                    case '.':
                        return new FullStop();
                    case '(':
                        return new OpenParenthesis();
                    case ')':
                        return new CloseParenthesis();
                    default:
                        return (Symbol)null;
                }
            });
#if false
            if (symbols.Any())
            {
                Console.WriteLine("Terminal Symbols");
                Console.WriteLine("================");
                foreach (var terminal in symbols)
                    Console.WriteLine("{0} >{1}<", terminal.GetType().Name.ToString(),
                        terminal.ToString());
                Console.WriteLine();
            }
#endif
            Formula formula = Formula.Produce(symbols);
            if (formula == null)
                throw new ParserException("Invalid formula");
            return formula;
        }
 
        public static void DumpSymbolRecursive(StringBuilder sb, Symbol symbol, int depth)
        {
            sb.Append(string.Format("{0}{1} >{2}<",
                "".PadRight(depth * 2),
                symbol.GetType().Name.ToString(),
                symbol.ToString())).Append(Environment.NewLine);
            if (symbol.ConstituentSymbols != null)
                foreach (var childSymbol in symbol.ConstituentSymbols)
                    DumpSymbolRecursive(sb, childSymbol, depth + 1);
        }
    }
 
    class Program
    {
        static void Main(string[] args)
        {
            string[] sampleValidFormulas = new[] {
                "1+((2+3)*4)^5",
                "1+2-3*4/5^6",
                "(1+2)/3",
                "  (1+3)  ",
                "-123",
                "1+2*(-3)",
                "1+2*( - 3)",
                "12.34",
                ".34",
                "-123+456",
                "  (  123 + 456 )  ",
                "-.34",
                "-12.34",
                "-(123+456)",
            };
 
            string[] sampleInvalidFormulas = new[] {
                "-(123+)",
                "-(*123)",
                "*123",
                "*123a",
                "1.",
                "--1",
            };
 
            StringBuilder sb = new StringBuilder();
            foreach (var formula in sampleValidFormulas)
            {
                Symbol f = SimpleFormulaParser.ParseFormula(formula);
                SimpleFormulaParser.DumpSymbolRecursive(sb, f, 0);
                sb.Append("==================================" + Environment.NewLine);
            }
            foreach (var formula in sampleInvalidFormulas)
            {
                bool exceptionThrown = false;
                try
                {
                    Symbol f = SimpleFormulaParser.ParseFormula(formula);
                }
                catch (ParserException e)
                {
                    exceptionThrown = true;
                    sb.Append(String.Format("Parsing >{0}< Exception: {1}", formula, e.Message) +
                        Environment.NewLine);
                }
                if (!exceptionThrown)
                    sb.Append(String.Format("Parsing >{0}< Should have thrown exception, but did not",
                        formula) + Environment.NewLine);
            }
            Console.WriteLine(sb.ToString());
        }
    }
 
    public class ParserException : Exception
    {
        public ParserException(string message) : base(message) { }
    }
 
    public static class MyExtensions
    {
        public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source,
            int count)
        {
            Queue<T> saveList = new Queue<T>();
            int saved = 0;
            foreach (T item in source)
            {
                if (saved < count)
                {
                    saveList.Enqueue(item);
                    ++saved;
                    continue;
                }
                saveList.Enqueue(item);
                yield return saveList.Dequeue();
            }
            yield break;
        }
 
        public static string StringConcatenate(this IEnumerable<string> source)
        {
            StringBuilder sb = new StringBuilder();
            foreach (string s in source)
                sb.Append(s);
            return sb.ToString();
        }
 
        public static string StringConcatenate<T>(
            this IEnumerable<T> source,
            Func<T, string> projectionFunc)
        {
            return source.Aggregate(new StringBuilder(),
                (s, i) => s.Append(projectionFunc(i)),
                s => s.ToString());
        }
 
        public static IEnumerable<TResult> Rollup<TSource, TResult>(
            this IEnumerable<TSource> source,
            TResult seed,
            Func<TSource, TResult, TResult> projection)
        {
            TResult nextSeed = seed;
            foreach (TSource src in source)
            {
                TResult projectedValue = projection(src, nextSeed);
                nextSeed = projectedValue;
                yield return projectedValue;
            }
        }
    }
}
 

Leave a Comment
  • Please add 5 and 4 and type the answer here:
  • Post
  • A good point was made by a reader - this example uses the Zip extension method, which is included in .NET 4.0.  It builds and runs as-is using Visual Studio 2010.  However, if you are using Visual Studio 2008, you need to include the Zip extension method.  A simple implementation can be found at:

    blogs.msdn.com/.../comparing-two-open-xml-documents-using-the-zip-extension-method.aspx

    The implementation of the extension method is about half way down the post.

    -Eric

  • This code assumes IEnumerable<Expression> is IEnumerable<Symbol> (true in C#4, not in C#3)

    to run this code in VS2008, you need to change the constructor for Symbol:

           public Symbol(params Object[] symbols) {

               List<Symbol> ls = new List<Symbol>();

               foreach (var item in symbols) {

                   if (item is Symbol)

                       ls.Add((Symbol)item);

                   else if (item is IEnumerable<Symbol>)

                       foreach (var item2 in (IEnumerable<Symbol>)item)

                           ls.Add(item2);

                   else if (item is IEnumerable)

                       foreach (var item2 in (IEnumerable)item)

                           ls.Add(item2 as Symbol);

                   else

                       // If this error is thrown, the parser is coded incorrectly.

                       throw new ParserException("Internal error");

               }

               ConstituentSymbols = ls;

           }

  • Thanks for the post Eric. The above source deals with all the simple calculations (shown above in valid formulas). But what if they have cell references.

    Like (=SUM(A1:A5)) OR =A5+B5.

    if i put any of them in the valid formula list, it is throwing exceptions. It is not treating as valid formula. Can you suggest a way to overcome this as well ?

    Thanks and Regards,

    Kishor

  • Hi Kishore,

    To extend as you suggest it is necessary to add the appropriate productions to the grammar and then write theropriate code.

    I still have designs on writing a cool parser for excel formulas.  All in good time, I hope.

    -Eric

  • Not sure if this blog is still being monitored... I'm keen to take this parser one step further by adding support for mathematical functions, e.g. "1 + Sqr(4) * Abs(-3)"? Can you provide any pointers as to where to start? Thanks in advance.

  • Silly question - I've just realised that this code only parses expressions, and doesn't actually evaluate them. What would be involved in implementing this functionality?

  • This code is tripped up by expressions containing multiple terms.

    E.g. "(1+2)+(3+4)" becomes "1+2)+(3+4" under the influence of:

    if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)

    {

     Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));

     if (e != null)

       return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());

    }

    This problem requires an additional test to recognize this circumstance then skip over this code. I created a method HasMultipleTerms() that does this by counting parentheses.

Page 1 of 1 (7 items)