A few weeks ago I read this article about writing a simple regular expression parser. That article does a really good job of explaining the theory behind regular expression. It then goes step by step into how to write a program (he uses C++) to parse a regular expression, convert it into a NFA, convert that into a DFA and then use that DFA to match strings.
After reading that I decided to write my own simple regular expression parser using Haskell. I saw it as a challenge to try to see how you deal with a more complex program in a pure functional language. After a couple weeks ( Grand Theft Auto 4 kind of ruined my progress for a while ) I have some results.
I split the project into 3 modules.
This is a very simple and limited regular expression parser. It supports only union(|), concatenation, closure(*) and parenthesis. In addition, I don't preserve information after the NFA is created about the location of the parenthesis. This means you can't pull out sub-matches when a entire expression matches.
In my next three I will talk about each module and point out interesting parts of them. There is nothing too complex but shows how to approach it in Haskell (making heavy use of the State monad).
If you want to see a much more complex and full featured regular expression parser written in Haskell take a look at this.
Click here to continue to Part 2.