September, 2010

  • Eric White's Blog

    Building a Simple Recursive Descent Parser (Completed Simple Parser)

    • 6 Comments

    In this post, I complete the recursive descent parser that will parse the simple grammar that I presented previously in this series.  In this post, I’ll examine in detail the NospaceExpression class, which needs to implement operator precedence.

    This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC
    Note:  I have to apologize for the delay in writing this post.  First summer vacation took some time, and then I became involved in a project of writing two papers – one that summarizes SharePoint application development, and another that summarizes Office Client application development.  I’ve been literally obsessing about every possible way that you as a developer can extend SharePoint Server 2010, and every way that you can extend the Office 2010 client applications.  The SharePoint paper is nearing publication.  The Office Client paper will be along in a bit.  While obsessing, it has been difficult for me to mentally switch tracks back to this series.  But here we go again…

    This post is one in a series on using LINQ to write a recursive-descent parser for SpreadsheetML formulas.  You can find the complete list of posts here.

    The NospaceExpression Class

    Here is the entire implementation of the NospaceExpression class.  I explained all of the code that has a gray background in the previous post, Building a Simple Recursive Descent Parser (continued).

    public static NospaceExpression Produce(IEnumerable<Symbol> symbols)
    {
        // nospace-expression = open-parenthesis expression close-parenthesis
        //         / numerical-constant
        //         / prefix-operator expression
        //         / expression infix-operator expression
     
        if (!symbols.Any())
            return null;
     
        if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
        {
            Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
            if (e != null)
                return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
        }
     
        // expression, infix-operator, expression
        var z = symbols.Rollup(0, (t, d) =>
        {
            if (t is OpenParenthesis)
                return d + 1;
            if (t is CloseParenthesis)
                return d - 1;
            return d;
        });
        var symbolsWithIndex = symbols.Select((s, i) => new
        {
            Symbol = s,
            Index = i,
        });
        var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
        {
            SymbolWithIndex = v1,
            Depth = v2,
        });
        var operatorList = z2
            .Where(x => x.Depth == 0 &&
                x.SymbolWithIndex.Index != 0 &&
                InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
            .ToList();
        if (operatorList.Any())
        {
            int minPrecedence = operatorList
                .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
            var op = operatorList
                .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
            if (op != null)
            {
                var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
                Expression e1 = Expression.Produce(expressionTokenList1);
                if (e1 == null)
                    throw new ParserException("Invalid expression");
                var expressionTokenList2 = symbols
                    .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
                Expression e2 = Expression.Produce(expressionTokenList2);
                if (e2 == null)
                    throw new ParserException("Invalid expression");
                InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
                return new NospaceExpression(e1, io, e2);
            }
        }
     
        NumericalConstant n = NumericalConstant.Produce(symbols);
        if (n != null)
            return new NospaceExpression(n);
     
        PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault());
        if (p != null)
        {
            Expression e = Expression.Produce(symbols.Skip(1));
            if (e != null)
                return new NospaceExpression(p, e);
        }
     
        return null;
    }
     

    Let’s work through the new code.  Let’s consider how the new code behaves when parsing the following formula:

    "1+((2+3)*4)^5"
     

    When determining if an expression has an infix operator, we specifically need to ignore operators that are in parentheses.  Those operators will have their chance to be recognized when NospaceExpression.Produce sees that an expression starts with an open parenthesis and ends with a close parenthesis, as processed by this code:

    if (!symbols.Any())
        return null;
     
    if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
    {
        Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
        if (e != null)
            return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
    }
     

    So we need to ignore operators that are inside parentheses.  The easiest way to determine if an operator is inside of parentheses is to use the Rollup extension method, as follows:

    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
     

    This use of the Rollup extension method returns a new collection of integers that has the same number of elements in it as the source collection.  The integer is the depth of parentheses at that point.  In the following snippet, I add a bit of code to print the items in the collection, and then exit:

    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    foreach (var item in z)
        Console.WriteLine(item);
    Environment.Exit(0);
     
    When I run this example for the test formula "1+((2+3)*4)^5", I see:
    0
    0
    1
    2
    2
    2
    2
    1
    1
    1
    0
    0
    0
     

    When looking for an infix operator, I can ignore all operators where the depth is not zero.

    Later on, I’m going to need the index of the symbol of the infix operator that we find.  The next projection uses symbols as its source, and projects an anonymous type that includes the symbol and its index:

    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    foreach (var item in symbolsWithIndex)
        Console.WriteLine(item);
    Environment.Exit(0);
     

    When I run this for the test formula, I see:

    { Symbol = 1, Index = 0 }
    { Symbol = +, Index = 1 }
    { Symbol = (, Index = 2 }
    { Symbol = (, Index = 3 }
    { Symbol = 2, Index = 4 }
    { Symbol = +, Index = 5 }
    { Symbol = 3, Index = 6 }
    { Symbol = ), Index = 7 }
    { Symbol = *, Index = 8 }
    { Symbol = 4, Index = 9 }
    { Symbol = ), Index = 10 }
    { Symbol = ^, Index = 11 }
    { Symbol = 5, Index = 12 }
     

    And next, use the Zip extension method to zip together the last two queries to create a list of items that contains:

    ·         The symbol

    ·         Its index

    ·         Its parenthesis depth

    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
    {
        SymbolWithIndex = v1,
        Depth = v2,
    });
    foreach (var item in z2)
        Console.WriteLine(item);
    Environment.Exit(0);
     

    This produces:

    { SymbolWithIndex = { Symbol = 1, Index = 0 }, Depth = 0 }
    { SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
    { SymbolWithIndex = { Symbol = (, Index = 2 }, Depth = 1 }
    { SymbolWithIndex = { Symbol = (, Index = 3 }, Depth = 2 }
    { SymbolWithIndex = { Symbol = 2, Index = 4 }, Depth = 2 }
    { SymbolWithIndex = { Symbol = +, Index = 5 }, Depth = 2 }
    { SymbolWithIndex = { Symbol = 3, Index = 6 }, Depth = 2 }
    { SymbolWithIndex = { Symbol = ), Index = 7 }, Depth = 1 }
    { SymbolWithIndex = { Symbol = *, Index = 8 }, Depth = 1 }
    { SymbolWithIndex = { Symbol = 4, Index = 9 }, Depth = 1 }
    { SymbolWithIndex = { Symbol = ), Index = 10 }, Depth = 0 }
    { SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }
    { SymbolWithIndex = { Symbol = 5, Index = 12 }, Depth = 0 }
     

    Next, we can look for any infix operators where the Depth == 0.  In addition, we filter out operators that are found in the very first position, because that will not be an infix operator, but instead a prefix operator.

    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
    {
        SymbolWithIndex = v1,
        Depth = v2,
    });
    var operatorList = z2
        .Where(x => x.Depth == 0 &&
            x.SymbolWithIndex.Index != 0 &&
            InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
        .ToList();
    foreach (var item in operatorList)
        Console.WriteLine(item);
    Environment.Exit(0);
     

    This produces a list of the two possible operators that are part of an expression that uses an infix operator:

    { SymbolWithIndex = { Symbol = +, Index = 1 }, Depth = 0 }
    { SymbolWithIndex = { Symbol = ^, Index = 11 }, Depth = 0 }
     

    Next, of the operators that could possibly be the infix operator for this expression, we find the minimum precedence of any operators.  When implementing a top-down parser, we process operators with the least precedence before processing operators of higher precedence.  When looking at the expression tree, operators with the least precedence are nearer to the root of the tree.  Operators with the highest precedence will be closer to the leaves, so that they will be evaluated before the operators with lower precedence.  If this isn’t clear, I think that it will be when you see the entire parse tree.  In this particular case, we need to find the + operator, not the ^ operator, as + has lower precedence.

    To enable processing of precedence of operators, I defined a small dictionary of operators and their precedence:

    public static Dictionary<string, int> OperatorPrecedence = new Dictionary<string, int>()
    {
        { "^", 3 },
        { "*", 2 },
        { "/", 2 },
        { "+", 1 },
        { "-", 1 },
    };
     

    In the following listing, the highlighted line finds the minimum precedence of the operators that are in play.

    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
    {
        SymbolWithIndex = v1,
        Depth = v2,
    });
    var operatorList = z2
        .Where(x => x.Depth == 0 &&
            x.SymbolWithIndex.Index != 0 &&
            InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
        .ToList();
    if (operatorList.Any())
    {
        int minPrecedence = operatorList
            .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
        var op = operatorList
            .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
        if (op != null)
        {
            var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
            Expression e1 = Expression.Produce(expressionTokenList1);
            if (e1 == null)
                throw new ParserException("Invalid expression");
            var expressionTokenList2 = symbols
                .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
            Expression e2 = Expression.Produce(expressionTokenList2);
            if (e2 == null)
                throw new ParserException("Invalid expression");
            InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
            return new NospaceExpression(e1, io, e2);
        }
    }
     

    And finally, we find the infix operator:

    // expression, infix-operator, expression
    var z = symbols.Rollup(0, (t, d) =>
    {
        if (t is OpenParenthesis)
            return d + 1;
        if (t is CloseParenthesis)
            return d - 1;
        return d;
    });
    var symbolsWithIndex = symbols.Select((s, i) => new
    {
        Symbol = s,
        Index = i,
    });
    var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
    {
        SymbolWithIndex = v1,
        Depth = v2,
    });
    var operatorList = z2
        .Where(x => x.Depth == 0 &&
            x.SymbolWithIndex.Index != 0 &&
            InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
        .ToList();
    if (operatorList.Any())
    {
        int minPrecedence = operatorList
            .Select(o2 =>OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
        var op = operatorList
            .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
        if (op != null)
        {
            var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
            Expression e1 = Expression.Produce(expressionTokenList1);
            if (e1 == null)
                throw new ParserException("Invalid expression");
            var expressionTokenList2 = symbols
                .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
            Expression e2 = Expression.Produce(expressionTokenList2);
            if (e2 == null)
                throw new ParserException("Invalid expression");
            InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
            return new NospaceExpression(e1, io, e2);
        }
    }
     

    If we found an operator, then we can new-up (highlighted in green) an instance of NoSpaceExpression, passing the symbols before and after the infix operator to Expression.Produce (highlighted in cyan).

    After parsing the sample expression, “1+((2+3)*4)^5”, the parser produces the following parse tree:

    Formula >1+((2+3)*4)^5<
      Expression >1+((2+3)*4)^5<
        NospaceExpression >1+((2+3)*4)^5<
          Expression >1<
            NospaceExpression >1<
              NumericalConstant >1<
                SignificandPart >1<
                  WholeNumberPart >1<
                    DigitSequence >1<
                      DecimalDigit >1<
          InfixOperator >+<      <= the + operator has lower precedence, so is nearer to the root of the tree, and will be evaluated last
            Plus >+<
          Expression >((2+3)*4)^5<
            NospaceExpression >((2+3)*4)^5<
              Expression >((2+3)*4)<
                NospaceExpression >((2+3)*4)<
                  OpenParenthesis >(<
                  Expression >(2+3)*4<
                    NospaceExpression >(2+3)*4<
                      Expression >(2+3)<
                        NospaceExpression >(2+3)<
                          OpenParenthesis >(<
                          Expression >2+3<
                            NospaceExpression >2+3<
                              Expression >2<
                                NospaceExpression >2<
                                  NumericalConstant >2<
                                    SignificandPart >2<
                                      WholeNumberPart >2<
                                        DigitSequence >2<
                                          DecimalDigit >2<
                              InfixOperator >+<
                                Plus >+<
                              Expression >3<
                                NospaceExpression >3<
                                  NumericalConstant >3<
                                    SignificandPart >3<
                                      WholeNumberPart >3<
                                        DigitSequence >3<
                                          DecimalDigit >3<
                          CloseParenthesis >)<
                      InfixOperator >*<
                        Asterisk >*<
                      Expression >4<
                        NospaceExpression >4<
                          NumericalConstant >4<
                            SignificandPart >4<
                              WholeNumberPart >4<
                                DigitSequence >4<
                                  DecimalDigit >4<
                  CloseParenthesis >)<
              InfixOperator >^<   <= the ^ operator has higher precedence, so is closer to leaves of the tree, so will be evaluated first
                Caret >^<
              Expression >5<
                NospaceExpression >5<
                  NumericalConstant >5<
                    SignificandPart >5<
                      WholeNumberPart >5<
                        DigitSequence >5<
                          DecimalDigit >5<
     

    Following is the complete listing of the simple parser.  While I haven’t discussed every class in the parser, I’ve introduced every technique that I’ve used in writing the Produce methods for each class.  To run this example, you can simply copy/paste this example into Visual Studio and run it.  There are no dependencies.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
     
    namespace SimpleParser
    {
        public abstract class Symbol
        {
            public List<Symbol> ConstituentSymbols { get; set; }
            public override string ToString()
            {
                string s = ConstituentSymbols.Select(ct => ct.ToString()).StringConcatenate();
                return s;
            }
            public Symbol(params Object[] symbols)
            {
                List<Symbol> ls = new List<Symbol>();
                foreach (var item in symbols)
                {
                    if (item is Symbol)
                        ls.Add((Symbol)item);
                    else if (item is IEnumerable<Symbol>)
                        foreach (var item2 in (IEnumerable<Symbol>)item)
                            ls.Add(item2);
                    else
                        // If this error is thrown, the parser is coded incorrectly.
                        throw new ParserException("Internal error");
                }
                ConstituentSymbols = ls;
            }
            public Symbol() { }
        }
     
        public class Formula : Symbol
        {
            public static Formula Produce(IEnumerable<Symbol> symbols)
            {
                // formula = expression
     
                Expression e = Expression.Produce(symbols);
                return e == null ? null : new Formula(e);
            }
            public Formula(params Object[] symbols) : base(symbols) { }
        }
     
        public class Expression : Symbol
        {
            public static Expression Produce(IEnumerable<Symbol> symbols)
            {
                // expression = *whitespace nospace-expression *whitespace
     
                int whiteSpaceBefore = symbols.TakeWhile(s => s is WhiteSpace).Count();
                int whiteSpaceAfter = symbols.Reverse().TakeWhile(s => s is WhiteSpace).Count();
                IEnumerable<Symbol> noSpaceSymbolList = symbols
                    .Skip(whiteSpaceBefore)
                    .SkipLast(whiteSpaceAfter)
                    .ToList();
                NospaceExpression n = NospaceExpression.Produce(noSpaceSymbolList);
                if (n != null)
                    return new Expression(
                        Enumerable.Repeat(new WhiteSpace(), whiteSpaceBefore),
                        n,
                        Enumerable.Repeat(new WhiteSpace(), whiteSpaceAfter));
                return null;
            }
     
            public Expression (params Object[] symbols) : base(symbols) { }
        }
     
        public class NospaceExpression : Symbol
        {
            public static Dictionary<string, int> OperatorPrecedence = new Dictionary<string, int>()
            {
                { "^", 3 },
                { "*", 2 },
                { "/", 2 },
                { "+", 1 },
                { "-", 1 },
            };
     
            public static NospaceExpression Produce(IEnumerable<Symbol> symbols)
            {
                // nospace-expression = open-parenthesis expression close-parenthesis
                //         / numerical-constant
                //         / prefix-operator expression
                //         / expression infix-operator expression
     
                if (!symbols.Any())
                    return null;
     
                if (symbols.First() is OpenParenthesis && symbols.Last() is CloseParenthesis)
                {
                    Expression e = Expression.Produce(symbols.Skip(1).SkipLast(1));
                    if (e != null)
                        return new NospaceExpression(new OpenParenthesis(), e, new CloseParenthesis());
                }
     
                // expression, infix-operator, expression
                var z = symbols.Rollup(0, (t, d) =>
                {
                    if (t is OpenParenthesis)
                        return d + 1;
                    if (t is CloseParenthesis)
                        return d - 1;
                    return d;
                });
                var symbolsWithIndex = symbols.Select((s, i) => new
                {
                    Symbol = s,
                    Index = i,
                });
                var z2 = symbolsWithIndex.Zip(z, (v1, v2) => new
                {
                    SymbolWithIndex = v1,
                    Depth = v2,
                });
                var operatorList = z2
                    .Where(x => x.Depth == 0 &&
                        x.SymbolWithIndex.Index != 0 &&
                        InfixOperator.Produce(x.SymbolWithIndex.Symbol) != null)
                    .ToList();
                if (operatorList.Any())
                {
                    int minPrecedence = operatorList
                        .Select(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()]).Min();
                    var op = operatorList
                        .Last(o2 => OperatorPrecedence[o2.SymbolWithIndex.Symbol.ToString()] == minPrecedence);
                    if (op != null)
                    {
                        var expressionTokenList1 = symbols.TakeWhile(t => t != op.SymbolWithIndex.Symbol);
                        Expression e1 = Expression.Produce(expressionTokenList1);
                        if (e1 == null)
                            throw new ParserException("Invalid expression");
                        var expressionTokenList2 = symbols
                            .SkipWhile(t => t != op.SymbolWithIndex.Symbol).Skip(1);
                        Expression e2 = Expression.Produce(expressionTokenList2);
                        if (e2 == null)
                            throw new ParserException("Invalid expression");
                        InfixOperator io = new InfixOperator(op.SymbolWithIndex.Symbol);
                        return new NospaceExpression(e1, io, e2);
                    }
                }
     
                NumericalConstant n = NumericalConstant.Produce(symbols);
                if (n != null)
                    return new NospaceExpression(n);
     
                PrefixOperator p = PrefixOperator.Produce(symbols.FirstOrDefault());
                if (p != null)
                {
                    Expression e = Expression.Produce(symbols.Skip(1));
                    if (e != null)
                        return new NospaceExpression(p, e);
                }
     
                return null;
            }
     
            public NospaceExpression(params Object[] symbols) : base(symbols) { }
        }
     
        public class NumericalConstant : Symbol
        {
            public static NumericalConstant Produce(IEnumerable<Symbol> symbols)
            {
                // numerical-constant = [neg-sign] significand-part
     
                SignificandPart s = SignificandPart.Produce(symbols);
                if (s != null)
                    return new NumericalConstant(s);
                NegSign n = NegSign.Produce(symbols.First());
                if (n != null)
                {
                    SignificandPart s2 = SignificandPart.Produce(symbols.Skip(1));
                    if (s2 != null)
                        return new NumericalConstant(n, s2);
                }
                return null;
            }
            public NumericalConstant(params Object[] symbols) : base(symbols) { }
        }
     
        public class SignificandPart : Symbol
        {
            public static SignificandPart Produce(IEnumerable<Symbol> symbols)
            {
                // significand-part = whole-number-part [fractional-part] / fractional-part
     
                FractionalPart f;
                f = FractionalPart.Produce(symbols);
                if (f != null)
                    return new SignificandPart(f);
                IEnumerable<Symbol> s = null;
                WholeNumberPart w = WholeNumberPart.Produce(symbols, out s);
                if (w != null)
                {
                    if (!s.Any())
                        return new SignificandPart(w);
                    f = FractionalPart.Produce(s);
                    if (f != null)
                        return new SignificandPart(w, f);
                }
                return null;
            }
            public SignificandPart(params Object[] symbols) : base(symbols) { }
        }
     
        public class WholeNumberPart : Symbol
        {
            public static WholeNumberPart Produce(IEnumerable<Symbol> symbols,
                out IEnumerable<Symbol> symbolsToProcess)
            {
                // whole-number-part = digit-sequence
     
                IEnumerable<Symbol> s = null;
                DigitSequence d = DigitSequence.Produce(symbols, out s);
                if (d != null)
                {
                    symbolsToProcess = s;
                    return new WholeNumberPart(d);
                }
                symbolsToProcess = null;
                return null;
            }
            public WholeNumberPart(params Object[] symbols) : base(symbols) { }
        }
     
        public class FractionalPart : Symbol
        {
            public static FractionalPart Produce(IEnumerable<Symbol> symbols)
            {
                // fractional-part = full-stop digit-sequence
     
                if (!symbols.Any())
                    return null;
                if (symbols.First() is FullStop)
                {
                    IEnumerable<Symbol> s = null;
                    DigitSequence d = DigitSequence.Produce(symbols.Skip(1), out s);
                    if (d == null || s.Any())
                        return null;
                    return new FractionalPart(new FullStop(), d);
                }
                return null;
            }
            public FractionalPart(params Object[] symbols) : base(symbols) { }
        }
     
        public class DigitSequence : Symbol
        {
            public static DigitSequence Produce(IEnumerable<Symbol> symbols,
                out IEnumerable<Symbol> symbolsToProcess)
            {
                // digit-sequence = 1*decimal-digit
     
                IEnumerable<Symbol> digits = symbols.TakeWhile(s => s is DecimalDigit);
                if (digits.Any())
                {
                    symbolsToProcess = symbols.Skip(digits.Count());
                    return new DigitSequence(digits);
                }
                symbolsToProcess = null;
                return null;
            }
            public DigitSequence(params Object[] symbols) : base(symbols) { }
        }
     
        public class NegSign : Symbol
        {
            public static NegSign Produce(Symbol symbol)
            {
                // neg-sign = minus
     
                if (symbol is Minus)
                    return new NegSign(symbol);
                return null;
            }
            public NegSign(params Object[] symbols) : base(symbols) { }
        }
     
        public class PrefixOperator : Symbol
        {
            public static PrefixOperator Produce(Symbol symbol)
            {
                // prefix-operator = plus / minus
     
                if (symbol is Plus || symbol is Minus)
                    return new PrefixOperator(symbol);
                return null;
            }
            public PrefixOperator(params Object[] symbols) : base(symbols) { }
        }
     
        public class InfixOperator : Symbol
        {
            public static InfixOperator Produce(Symbol symbol)
            {
                // infix-operator = caret / asterisk / forward-slash / plus / minus
     
                if (symbol is Plus || symbol is Minus || symbol is Asterisk || symbol is ForwardSlash
                    || symbol is Caret)
                    return new InfixOperator(symbol);
                return null;
            }
            public InfixOperator(params Object[] symbols) : base(symbols) { }
        }
     
        public class DecimalDigit : Symbol
        {
            private string CharacterValue;
            public override string ToString() { return CharacterValue; }
            public DecimalDigit(char c) { CharacterValue = c.ToString(); }
        }
     
        public class WhiteSpace : Symbol
        {
            public override string ToString() { return " "; }
            public WhiteSpace() { }
        }
     
        public class Plus : Symbol
        {
            public override string ToString() { return "+"; }
            public Plus() { }
        }
     
        public class Minus : Symbol
        {
            public override string ToString() { return "-"; }
            public Minus() { }
        }
     
        public class Asterisk : Symbol
        {
            public override string ToString() { return "*"; }
            public Asterisk() { }
        }
     
        public class ForwardSlash : Symbol
        {
            public override string ToString() { return "/"; }
            public ForwardSlash() { }
        }
     
        public class Caret : Symbol
        {
            public override string ToString() { return "^"; }
            public Caret() { }
        }
     
        public class FullStop : Symbol
        {
            public override string ToString() { return "."; }
            public FullStop() { }
        }
     
        public class OpenParenthesis : Symbol
        {
            public override string ToString() { return "("; }
            public OpenParenthesis() { }
        }
     
        public class CloseParenthesis : Symbol
        {
            public override string ToString() { return ")"; }
            public CloseParenthesis() { }
        }
     
        public class SimpleFormulaParser
        {
            public static Symbol ParseFormula(string s)
            {
                IEnumerable<Symbol> symbols = s.Select(c =>
                {
                    switch (c)
                    {
                        case '0':
                        case '1':
                        case '2':
                        case '3':
                        case '4':
                        case '5':
                        case '6':
                        case '7':
                        case '8':
                        case '9':
                            return new DecimalDigit(c);
                        case ' ':
                            return new WhiteSpace();
                        case '+':
                            return new Plus();
                        case '-':
                            return new Minus();
                        case '*':
                            return new Asterisk();
                        case '/':
                            return new ForwardSlash();
                        case '^':
                            return new Caret();
                        case '.':
                            return new FullStop();
                        case '(':
                            return new OpenParenthesis();
                        case ')':
                            return new CloseParenthesis();
                        default:
                            return (Symbol)null;
                    }
                });
    #if false
                if (symbols.Any())
                {
                    Console.WriteLine("Terminal Symbols");
                    Console.WriteLine("================");
                    foreach (var terminal in symbols)
                        Console.WriteLine("{0} >{1}<", terminal.GetType().Name.ToString(),
                            terminal.ToString());
                    Console.WriteLine();
                }
    #endif
                Formula formula = Formula.Produce(symbols);
                if (formula == null)
                    throw new ParserException("Invalid formula");
                return formula;
            }
     
            public static void DumpSymbolRecursive(StringBuilder sb, Symbol symbol, int depth)
            {
                sb.Append(string.Format("{0}{1} >{2}<",
                    "".PadRight(depth * 2),
                    symbol.GetType().Name.ToString(),
                    symbol.ToString())).Append(Environment.NewLine);
                if (symbol.ConstituentSymbols != null)
                    foreach (var childSymbol in symbol.ConstituentSymbols)
                        DumpSymbolRecursive(sb, childSymbol, depth + 1);
            }
        }
     
        class Program
        {
            static void Main(string[] args)
            {
                string[] sampleValidFormulas = new[] {
                    "1+((2+3)*4)^5",
                    "1+2-3*4/5^6",
                    "(1+2)/3",
                    "  (1+3)  ",
                    "-123",
                    "1+2*(-3)",
                    "1+2*( - 3)",
                    "12.34",
                    ".34",
                    "-123+456",
                    "  (  123 + 456 )  ",
                    "-.34",
                    "-12.34",
                    "-(123+456)",
                };
     
                string[] sampleInvalidFormulas = new[] {
                    "-(123+)",
                    "-(*123)",
                    "*123",
                    "*123a",
                    "1.",
                    "--1",
                };
     
                StringBuilder sb = new StringBuilder();
                foreach (var formula in sampleValidFormulas)
                {
                    Symbol f = SimpleFormulaParser.ParseFormula(formula);
                    SimpleFormulaParser.DumpSymbolRecursive(sb, f, 0);
                    sb.Append("==================================" + Environment.NewLine);
                }
                foreach (var formula in sampleInvalidFormulas)
                {
                    bool exceptionThrown = false;
                    try
                    {
                        Symbol f = SimpleFormulaParser.ParseFormula(formula);
                    }
                    catch (ParserException e)
                    {
                        exceptionThrown = true;
                        sb.Append(String.Format("Parsing >{0}< Exception: {1}", formula, e.Message) +
                            Environment.NewLine);
                    }
                    if (!exceptionThrown)
                        sb.Append(String.Format("Parsing >{0}< Should have thrown exception, but did not",
                            formula) + Environment.NewLine);
                }
                Console.WriteLine(sb.ToString());
            }
        }
     
        public class ParserException : Exception
        {
            public ParserException(string message) : base(message) { }
        }
     
        public static class MyExtensions
        {
            public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source,
                int count)
            {
                Queue<T> saveList = new Queue<T>();
                int saved = 0;
                foreach (T item in source)
                {
                    if (saved < count)
                    {
                        saveList.Enqueue(item);
                        ++saved;
                        continue;
                    }
                    saveList.Enqueue(item);
                    yield return saveList.Dequeue();
                }
                yield break;
            }
     
            public static string StringConcatenate(this IEnumerable<string> source)
            {
                StringBuilder sb = new StringBuilder();
                foreach (string s in source)
                    sb.Append(s);
                return sb.ToString();
            }
     
            public static string StringConcatenate<T>(
                this IEnumerable<T> source,
                Func<T, string> projectionFunc)
            {
                return source.Aggregate(new StringBuilder(),
                    (s, i) => s.Append(projectionFunc(i)),
                    s => s.ToString());
            }
     
            public static IEnumerable<TResult> Rollup<TSource, TResult>(
                this IEnumerable<TSource> source,
                TResult seed,
                Func<TSource, TResult, TResult> projection)
            {
                TResult nextSeed = seed;
                foreach (TSource src in source)
                {
                    TResult projectedValue = projection(src, nextSeed);
                    nextSeed = projectedValue;
                    yield return projectedValue;
                }
            }
        }
    }
     

Page 1 of 1 (1 items)
Page 1 of 1 (1 items)