We went through the process of building a parser from scratch in part 1, starting by a parser that can read a single char from an input string, and then sequencing it to build more capable parsers. I ended it up by saying we have the building blocks to build general parsers and that F# can provide us with some magic if we wrap up our parsers into computation expressions.
We will stick to the one line programming language once more for part 2 (last time, promise), and will slowly get to the magical Parser Monads. By the end of this post, we should have all parsers from the Monadic Parsing in Haskell paper up and running.
You probably noticed we kept using expressions like “a thing Parser”, “an ‘a Parser”, “satisfy this predicate Parser”, etc. everywhere in part 1 of the series. It seems logical, at this point, to embrace this concept and code it accordingly in F#, doesn’t it? Remembering again Dr. Seuss’ parser definition using F# notation:
‘a Parser : char list -> (‘a * char list) list
Let’s create a “container” for our parsers using an algebraic type – we will simply take the parser definition as-is, and enclose it into a Parser container:
type 'a Parser = Parser of (char list -> ('a * char list) list) F# Web Snippets
Got it? We didn’t really do much, literally just copied and pasted Dr. Seuss definition… Any of the parsers we built in part 1 can be encapsulated into this Parser container, for example, remember our char Parser called cParser?
let cParser = function [] -> [] | c::cs -> [c, cs]
Let’s wrap it into the Parser type and call it item (adopting the name convention from Erik's paper):
let item = Parser (function [] -> [] | c::cs -> [c, cs])
Hover your mouse over item and let me know what you see… Correct! As expected item is now officially a char Parser! Let’s test it and see if it works (s2cs was defined in the last post):
> "Test" |> s2cs |> item;;
"Test" |> s2cs |> item;; ------------------^^^^
C:\Users\fzandona\AppData\Local\Temp\stdin(3,19): error FS0001: This expression was expected to have type char list -> 'a but here has type char Parser
Oops! Clearly we cannot use the same approach we've been using here; we need to first extract the function form inside the Parser type before applying it. Thanks to F#’s pattern matching, we can write:
let parse (Parser p) = p
And use it like this:
> "Test" |> s2cs |> parse item;;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
Good, we are back in business!
Digressing a little bit here, you’ve probably already noticed that I really like to chain functions together by “pipelining” them using the pipeline operator ( |> ); it looks very expressive to me, for example, the F# line above can be read as: take “Text” and send it to s2cs, take the result (a char list) and send it to (parse item). Compare it to, where you need to read the code backwards:
> parse item (s2cs "Text");;val it : (char * char list) list = [('T', ['e'; 'x'; 't'])]
Back on track, we will continue wrapping all parsers we’ve built so far with our Parser container in a minute, but let’s focus initially at these 3 functions we worked on:
let returnParser thing = fun cs -> [thing, cs] let zeroParser () = fun _ -> [] let bindParser (p:(char list -> ('a * char list) list), (f : 'a -> (char list -> ('b * char list) list))) = fun cs -> match p cs with | (c', cs')::_ -> (f c') cs' | [] -> []
We use these guys a lot so let’s give them some TLC. Not sure if you realized, but they all comply with Dr. Seuss definition, so it is going to be a good thing to encapsulate them into the new Parser type, giving us:
let returnParser' thing = Parser (fun cs -> [thing, cs]) let zeroParser' () = Parser (fun _ -> []) let bindParser' (p : ('a Parser), f : ('a -> 'b Parser)) = Parser( fun cs -> match parse p cs with | (c', cs')::_ -> parse (f c') cs' | [] -> [] )
Note that the only difference from this snippet and the previous one is the existence of the parser container, and the use of our parse function to extract the parser from the container before applying it. By the way, hover your mouse over these functions and note how cleaner the type definitions are, we got rid of all that “char list” noise thanks to the Parser type!
Second, because these function/parsers are so important, and now that we have them beautifully wrapped, how about putting them into a type class for “organizational reasons”?
type ParserBuilder () = member x.Return a = Parser (fun cs -> [a, cs]) member x.Bind (p, f) = Parser (fun cs -> match parse p cs with | (c', cs')::_ -> parse (f c') cs' | [] -> [] ) member x.Zero () = Parser (fun _ -> []) member x.ReturnFrom a = a let parser = ParserBuilder()
No changes here, just a class with our previous functions as members, simple and clean (well, I actually sneaked in another member called ReturnFrom, which is just an id function, meaning it returns whatever it receives as input; you will see why we need it later). We also defined a parser value, which is just an instance of our ParserBuilder class.
The next two parsers we built last time were satisfyParser and the tcParser (“this specific char Parser”):
let satisfyParser pred = bindParser(cParser, fun c -> if pred c then returnParser c else zeroParser() ) let tcParser c = satisfyParser ((=) c)
We should build them in the new neater way by using the parser instance (let parser = ParserBuilder()) and the item parser:
let satisfyParser' pred = parser.Bind(item, fun c -> if pred c then parser.Return c else parser.Zero() ) let tcParser' c = satisfyParser' ((=) c)
No news here, just a plain straight rewrite using the new function notations.
So far so good, but it can get tiring very quickly if we need to keep calling all those binds, returns and zeros, to build more capable parsers - so let’s let the F# compiler do it for us!
I actually tricked you into creating the ParserBuilder class for “organization purposes” only, I did have a hidden goal… It turns out that, whenever the compiler sees a class type with some especial methods like the ones in ParserBuilder, it does some under-the-covers magic and provides us with some well-deserved syntactic sugar (interestingly enough, you will see that this new syntax allows us to write code that looks very similar to the imperative style, go figure...). For example, satisfyParser’ and tcParser’ can now be written as (renaming them to sat and tChar respectively):
let sat pred = parser { let! c = item if pred c then return c } let tChar c = sat ((=) c)
This is very neat, isn’t it? They call it computation expressions, but I prefer to call it "the magical syntactic sugar that allows me to easily sequence and compose computations". It is exactly the same behavior as before, but with some magical sugar from the compiler: let! c = item is replaced by parser.Bind(item, fun c -> ...); everything after the let! line is added to the continuation function of the Bind call, and finally return c is replaced by the call to parser.Return c. Note how “sweet” this notation is: we don’t even need an else branch in the conditional test above, the compiler is attaching the call to the Zero() for us! You can find all special methods for the "builder types" (like our ParserBuilder class) on this MSDN entry.
If you reached this far, I have a suprise for you! You may not have noticed, but you’ve just built your very first Parser Monad! Congrats! It didn’t even hurt, did it?
No kidding! F#’s computation expression is F#’s syntax for monads, so applying some logic here: a parser using computation expression is a parser using monad, which is a Parser Monad! Cool, now go and tell everyone you’ve conquered the “Monadic Parser Badge” :-).
Using this new notation, let’s port to F# the remaining parsers from Erik’s paper: a couple of “choice combinators”, the "recursion combinators" and finally the "lexical combinators".
Plus and Or Parsers
/// Concatenates the results of applying parser p and parser q let (<+>) p q = Parser (fun cs -> (parse p cs) @ (parse q cs)) /// Applies parser p or parser q and returns at most one result let (<|>) p q = Parser (fun cs -> match (parse (p <+> q) cs) with | [] -> [] | x::xs -> [x] )
The “plus” parser (<+>) sequences two parsers (p and q) on the same input string and concatenates the resulting lists. We use this “plus” parser to build an “or” parser which returns at most one result: the result of applying p if it succeeds, or the result of applying q if p fails (or empty list if both fail).
The somewhat weird <+> and <|> function names allow us to use them in an infix manner, so we can write things like: aParser <+> bParser or cParser <|> dParser.
Let’s run them on F# interactive to better understand their behavior:
> "Test" |> s2cs |> parse (tChar 'T' <+> tChar 'T');;val it : (char * char list) list = [('T', ['e'; 's'; 't']); ('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'T' <|> tChar 'T');;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'T' <+> tChar 'Z');;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'T' <|> tChar 'Z');;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'Z' <|> tChar 'T');;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'Z' <+> tChar 'T');;val it : (char * char list) list = [('T', ['e'; 's'; 't'])]
> "Test" |> s2cs |> parse (tChar 'Z' <+> tChar 'Z');;val it : (char * char list) list = []
Text Parser
/// Given a char list, returns a parser that parsers it let rec text = function | [] -> parser { return [] } | c::cs -> parser { let! _ = tChar c let! _ = text cs return c::cs }
The text parser is very similar to the stringParser we built in the last post: it takes a char list as parameter and builds a parser that is capable of parsing that specific char list:
> let thisParser = text (s2cs "This");;
val thisParser : char list Parser = Parser <fun:Bind@40>
> "This is a test" |> s2cs |> parse thisParser;;val it : (char list * char list) list = [(['T'; 'h'; 'i'; 's'], [' '; 'i'; 's'; ' '; 'a'; ' '; 't'; 'e'; 's'; 't'])]
Many and Many1 Parsers
/// Combines many (0 or more) applications of parser p let rec many p = (many1 p) <|> parser { return [] } /// Combines at least one (1 or more) applications of parser p and many1 p = parser { let! r = p let! rs = many p return r::rs }
Many and its brother many1 are really useful parsers when you need to sequence zero to many, or one to many, applications of a parser. For example, remember the homework from the last post, how to parse “fun z = 7”? We can now use many/many1 to easily consume all those white spaces from the input string:
type FunAST = Fun of (char * int) let funParser = parser { let! _ = text (s2cs "fun") let! _ = many1 (tChar ' ') return Fun } let identParser = parser { let! ident = item let! _ = many (tChar ' ') return ident } let digitParser = parser { let! d = sat Char.IsDigit let! _ = many (tChar ' ') return (d |> string |> Int32.Parse) } let equalParser = parser { let! c = tChar '=' let! _ = many (tChar ' ') return c } let funLangParser = parser { let! _ = many (tChar ' ') let! funFunc = funParser let! ident = identParser let! _ = equalParser let! digit = digitParser return funFunc(ident, digit) }
> "fun z = 7" |> s2cs |> parse funLangParser;;val it : (FunAST * char list) list = [(Fun ('z', 7), [])]> "fun z = 7 " |> s2cs |> parse funLangParser;;val it : (FunAST * char list) list = [(Fun ('z', 7), [])]> "fail z = 7 " |> s2cs |> parse funLangParser;;val it : (FunAST * char list) list = []
Sepby and Sepby1 Parsers
/// Combines 0 or more applications of parser p separated by parser sep let rec sepby p sep = (sepby1 p sep) <|> parser { return [] } /// Combines 1 or more applications of parser p separated by parser sep and sepby1 p sep = parser { let! r = p let! rs = many (parser { let! _ = sep return! p }) return r::rs }
The next pair of recursion combinators is sepby and sepby1 – they are also very useful when you have a parsing pattern, e.g. things “separated” by other things. They both take two parsers as parameters, a parser to be repeatedly sequenced, and a “separation” parser, that is thrown away. Note how they rely on many and <|> in their implementation. To test them, let’s pretend we can now have entries like this in our fun language:
fun x = 1;fun z = 9;
We could easily parse them by using the sepby/sepby1 parsers:
let sepParser = parser { let! _ = many (tChar ' ') let! c = tChar ';' return c } let improvedLangParser = sepby1 funLangParser sepParser
> "fun z = 1;" |> s2cs |> parse improvedLangParser;;val it : (FunAST list * char list) list = [([Fun ('z', 1)], [';'])]> "fun z = 1; fun y = 2; fun w = 3" |> s2cs |> parse improvedLangParser;;val it : (FunAST list * char list) list = [([Fun ('z', 1); Fun ('y', 2); Fun ('w', 3)], [])]
Chainl and Chainl1 Parsers
/// Chain 0 or more applications of parser p separated by applications of parser op let rec chainl p op a = (chainl1 p op) <|> parser { return a } /// Chain 1 or more applications of parser p separated by applications of parser op and chainl1 p op = let rec rest r = parser { let! f = op let! r' = p return! rest (f r r') } <|> parser {return r} parser { let! a = p in return! rest a }
Finally the last pair of recursion combinators is chainl and chainl1 – they are a little bit trickier though, they parse repeated applications of parser p separated by applications of parser op, similar to sepby; however, op is a left associative operator parser (ohh… I’ve got to use that phrase tonight at the dinner table!). If p is an ‘a Parser, then op will be an (‘a -> ‘a -> ‘a) Parser…
Improving even more our fun language, let’s now support code like “fun x = 1 + 1” by using the chainl and <|> parsers:
let sumParser = parser { let! _ = tChar '+' let! _ = many (tChar ' ') return (+) } let digitParser' = chainl1 digitParser sumParser <|> digitParser let funLangParser' = parser { let! _ = many (tChar ' ') let! funFunc = funParser let! ident = identParser let! _ = equalParser let! digit = digitParser' return funFunc(ident, digit) } let improvedLangParser' = sepby1 funLangParser' sepParser
> "fun x=1+ 1; fun z = 9 + 1 + 2 + 3; fun w = 5" |> s2cs |> parse improvedLangParser';;val it : (FunAST list * char list) list = [([Fun ('x', 2); Fun ('z', 15); Fun ('w', 5)], [])]
Hover over the new sumParser parser and you will notice it is a (int -> int -> int) Parser and it returns a left associative operator (+). We use it in the new digitParser’ by chaining digitParser with it. We then apply <|> to try both options, e.g., a digit in our language can be a sum of digits (chainl1 digitParser sumParser), or (<|>) a single digit (digitParser): at most one of these two parsers must return a value. It is also important to note the order used to in the <|> call, had digitParser been used first (e.g. “digitParser <|> chainl1 digitParser sumParser”), chainl1 would never have a chance to be applied, because digitParser would always return the first digit of the sum.
These are very simple parsers but yet very handy. They will implement some of the functionality we’ve been hardcoding so far.
Space Parser
let isSpace = // list of "space" chars based on // http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Char.html#v:isSpace let cs = [' '; '\t'; '\n'; '\r'; '\f'; '\v'] |> Set.ofList cs.Contains let space = many (sat isSpace)
space comes to get us out of the business of parsing whitespaces. We build a predicate function (isSpace) using a set of chars we want to consider as whitespaces (I’m basing mine on the same ones used by Haskell’s Data.Char.isSpace), and then we create a parser that is capable of “parsing many characters that satisfy the isSpace predicate” (I like how the function implementation easily translates to English: many (sat isSpace)).
Token Parser
let token p = parser { let! r = p let! _ = space return r }
token takes a parser, sequences it and the space parser, returning the result of the application of the input parser; in other words, it will apply a parser you give it and remove all white space afterwards, returning the parsed thing.
Symbol Parser
let symb = text >> token
symb takes it further by composing a text parser and a token parser (for the less “functional composition” savvy, it could be written as let symb cs = token (text cs) :-). This little guy will allow us to parse a specific string (a literal, for example) with whitespace clean up afterwards… and it will be used a lot.
Apply Parser
let apply p = parse (parser { let! _ = space let! r = p return r })
Last but not least, apply is used to initiate the parsing computation: it calls parse (the extraction function we defined earlier) on a parser that cleans up the initial whitespace (if any) of the string to be parsed and applies the parser you give it to the rest of the string.
Let’s take all these parser combinators and build yet another, but this time really final, version of our fun language parser. Just because we now have more building blocks in our bag, let’s add the capability of accepting more than one digit at time (“fun a = 123 + 321”), by using many1 in the digit parser (see below):
let funP = parser { let! _ = symb (s2cs "fun") in return Fun } let identP = parser { let! c = token item let! _ = symb ['='] return c } let sumP = parser { let! _ = symb ['+'] in return (+) } let digitP = let digitP' = parser { let! ds = token (many1 (sat Char.IsDigit)) return (ds |> cs2s |> Int32.Parse) } chainl1 digitP' sumP <|> digitP' let funLangP = parser { let! f = funP let! ident = identP let! d = digitP return f(ident, d) } let finalLangP = sepby1 funLangP (symb [';'])
Do you remember the runParser function? We can update it to use the apply parser (so white space in the beginning of the string is taken care of), and test our final parser:
let runParser p = s2cs >> apply p >> function | [] -> failwith "Error parsing string" | (result,_)::_ -> result
let doesItReallyWork () = " fun x=99+ 1; fun z = 9 + 1 + 2 + 3; fun w = 57" |> runParser finalLangP
> doesItReallyWork ();;
val it : FunAST list = [Fun ('x', 100); Fun ('z', 15); Fun ('w', 57)]
Success!!!
We learned that, by moving the handcrafted parsers we built in part 1 to using Computation Expressions, we were able to build parsers in a more succinct and expressive way, thanks to the syntax sugar provided by F#. Using this new trick, we ported all parser combinators in Erik/Graham's paper from Haskell to F#, creating a basic set of general purpose parsers.
We are now ready to move to our next challenge: build the JSON parser (finally!). Enough of the good-for-nothing fun language – I hope it did serve its purpose though.
See you next and let me know your comments!
• Part 1: Building a parser from scratch with Dr. Seuss help (part 1) • Part 2: Tricking you into Parser Monads (it's not going to hurt, I promise)• Part 3: The JSON Parser Monad
These postings are provided "AS IS" with no warranties, and confer no rights.