The first module in my simple regular expression parse is called RegexToNFA. This module exposes the types that make up a finite state machine and also the functions to convert a regular expression string into a finite state machine.

My structure for a FSM follows closely from the mathematical definition:

` 1: data FiniteMachine = FiniteMachine{ table :: [Transition],`

` 2: alphabet :: Set Char,`

` 3: start :: Node,`

` 4: final :: Set Node`

` 5: `

` 6: `

` 7: -- NFA node`

` 8: type Node = Integer`

` 9: `

` 10: -- The value for an edge in a NFA`

` 11: type TransitionValue = Maybe Char`

` 12: `

` 13: -- A transition in a NFA is a tuple of`

` 14: -- StartNode , DestinationNode, Value to transition on`

` 15: type Transition = (Node,Node,TransitionValue)`

` 16: `

` 17: -- The value of the edge in the NFA is a Maybe Char `

` 18: -- Where Nothing is the epsilon transition`

` 19: -- therefore lets just rename Nothing to epsilon`

I have the value which you transition on as a Maybe Char (which I alias as TransitionValue). This allowed me to define epsilon as Nothing data constructor.

With this structure defined my goal now is to convert a regular expression pattern such as: (a|b)* into a FiniteMachine. In order to do this there is a lot of state that I need to keep track of which naturally leads to the use of the State monad. To do this I set up a structure for what data I want to be kept track of and then create a state monad using that structure:

` 1: -- The state that gets passed around which we used to build up the NFA`

` 2: data ParseContext = Context `

` 3: {`

` 4: nodeList :: [Node],`

` 5: transitions :: [Transition],`

` 6: operators :: OperatorList,`

` 7: nextNode :: Node,`

` 8: values :: Set Char`

` 9: } deriving (Show, Eq)`

` 10: `

` 11: -- Alias the State data constructor with a more friendly name`

` 12: type RegexParseState a = State ParseContext a`

This structure is passed between functions to allow them to see the current state of the parsing and create a new state. I define many functions, each which deal with a piece of the puzzle of converting the input string into a FSM. I am not going to address them all but I will point out some which is note worth:

**convertToNFA** - This is the top level function, it is exposed externally and lets you convert a regex to a NFA.

**processOperator** - This function determines when we should execute an operator given its precedence. We assign each operator a precedence which lets us determine when we should execute an operator. For example in the expression a|b*, we want to execute star before we execute union.

Last but not least are the methods which execute the operators. For example, there is one called **doConcat**, which performs the concatenation of two values in the regular expression. **doConcat** isn't pretty, since its doing the dirty work of examining the state and create a new state to reflect a partially completed FSM.

` 1: -- Execute the concat operator`

` 2: doConcat :: RegexParseState ()`

` 3: doConcat = do`

4: st <- get

` 5: let nodes = nodeList st`

` 6: newNodes = (nodes !! 0) : (nodes !! 3) : (drop 4 nodes)`

` 7: newTransitions = transitions st ++ [(nodes !! 2, nodes !! 1, epsilon)]`

` 8: newOperators = tail $ operators st`

` 9: put $ st { nodeList = newNodes,`

` 10: transitions = newTransitions ,`

` 11: operators = newOperators}`

With all this in place, lets finally see what this module actually outputs.

> convertToNFA "a|(bc)*"

FiniteMachine {table =

[(4,5,Just 'c'),(2,3,Just 'b'),(0,1,Just'a'),

(3,4,Nothing),(6,2,Nothing),(6,0,Nothing),

(1,7,Nothing),(5,7,Nothing),(8,6,Nothing),

(8,7,Nothing),(7,9,Nothing),(9,8,Nothing)],

alphabet = fromList "abc",

start = 8,

final = fromList [9]}

If you examine the table list in the output, you will see all the transitions for the NFA that accepts "a|(bc)*" and that the start state is node 8 and the accept state is node 9.

I uploaded the RegexToNFA.hs file for your examination. I tried to comment it a good amount and I feel it should be pretty easy to read and understand.

In the next part I will talk about the next modules: NFAtoDFA