To learn how recursive descent parsers work, it is helpful to implement a very simple grammar, so for pedagogical purposes, I’ve defined a grammar for simple arithmetic expressions. The parser will construct a syntax tree from expressions that we can then examine as necessary. Just for fun, after implementing the parser, we will write a small method that will evaluate the formulas.
This blog is inactive.New blog: EricWhite.com/blogBlog TOCThis post is one in a series on using LINQ to write a recursive-descent parser for SpreadsheetML formulas. You can find the complete list of posts here.
In these expressions, operands are floating point numbers, but for simplicity, I’ve eliminated the ability to have exponents. Floating point numbers are the only variety of operand; to demonstrate how to write the parser, it’s not necessary to include the idea of variables or other types of operands. When writing the parser for Excel formulas, we’ll need to deal with both exponents and other types of operands, as well as many more varieties of issues.
There are five binary (sometimes called infix) operators. Operator precedence needs to be honored when parsing formulas:
There is one prefix operator, ‘-‘, which has higher precedence than the infix operators.
Operands can consist of a significand (sometimes called mantissa) part, followed by a fractional part.
The grammar allows for white space at appropriate places in the expression. The allowance of white space in this simple grammar parallels the allowance of white space in the Excel Extensions to the Office Open XML SpreadsheetML File Format (.xlsx) Specification.
The grammar allows for use of parentheses.
Here are some examples of formulas that should parse properly:
"(1+3)/3" " (1+3) " "-123" "1+2*(-3)" "1+2*( - 3)" "12.34" ".34" "-123+456" "-(123+456)" " ( 123 + 456 ) " "1+2-3*4/5^6" "-.34" "-12.34"
One of the important characteristics of a parser is to report when an expression is invalid per the grammar. Here are some expressions that should throw exceptions:
"-(123+)" "-(*123)" "*123" "123a" "1." "--1"
Here is the grammar, as I’ve defined it. There are only eleven rules (other than the rules that are comprised only of terminals):
formula = expression expression = *whitespace nospace-expression *whitespace nospace-expression = open-parenthesis expression close-parenthesis / expression infix-operator expression / numerical-constant / prefix-operator expression numerical-constant = [neg-sign] significand-part significand-part = whole-number-part [fractional-part] / fractional-part whole-number-part = digit-sequence fractional-part = full-stop digit-sequence neg-sign = minus digit-sequence = 1*decimal-digit prefix-operator = plus / minus infix-operator = caret / asterisk / forward-slash / plus / minus // The following symbols are comprised only of terminals. decimal-digit = %x30-39 whitespace = %x20 plus = "+" minus = "-" asterisk = "*" forward-slash = "/" caret = "^" full-stop = "." open-parenthesis = "(" close-parenthesis = ")"
In the next post in this series, I’ll start discussing some of the C# / LINQ techniques that we can use to make coding this parser super-easy.
It was very useful. I would like to know the continuation of this, if its possible. If so, where can I find them(about compilers).