In my previous post, I described how to parse a regular expression pattern into an abstract syntax representation. The next phase is to transform the parsed syntax into a finite state machine. I have chosen to model the FSM as a mathematical directional graph (digraph). This post describes the implementation of this module in preparation for the actual compilation step.

You can read the previous article here: F#: Building a Regular Expression Pattern Parser.

**Overview**

There are a number of ways in which a digraph can be modelled:

- As a set of vertices and a set of edges
- As a set of vertices and a table with positions for every possible edge
- As a set of tuples comprising a vertex and a set of its edges

Each of these options has both advantages and disadvantages. For example, it is very easy to find paths using the second approach but it requires space that grows at a rate of n^2 where n is the number of vertices.

I have chosen to use the third of these, often called an adjacency list approach. I have also augmented the data structure with some other information:

- integer identifiers to allow the addressing of a vertex or edge by a number
- integer priorities on edges to allow edges to be sorted according to priority (this will be useful for the FSM)
- arbitrary data to be associated with both edges and vertices

I have omitted several functions that a typical graph representation may support, such as union and intersection. This is because I do not think they will be needed for the regular expression compilation. Instead, I have added methods that support atomic changes to the digraph:

- obtaining an empty graph
- adding and removing edges
- adding and removing vertices

Edges are only valid between existing vertices but the algorithm does not perform any checking.

The code follows. Again, I have split the code into a module file with a separate signature:

**Graph.fsi**

#light module Graph (* In this module, a graph is represented by an adjacency list. This module is not intended to be a high performance implementation. This module can represent a directed or undirected graph depending on the method of handling edges. The default behaviour results in a directed graph. *) type VertexData<'V> = int (* identifier *) * 'V (* vertex data *) type EdgeData<'E> = int (* identifier *) * int (* priority *) * int (* vertex target *) * 'E (* edge data *) (* The graph uses adjacency list notation *) type Adjacency<'E> = EdgeData<'E> list (* The Vertex type represents the internal structure of the graph *) type Vertex<'V, 'E> = VertexData<'V> * Adjacency<'E> (* A Graph is a Vertex list. The nextNode allows for consistent addressing of nodes *) type Graph<'V, 'E> = int (* nextNode identifier *) * Vertex<'V, 'E> list (* Empty graph construction *) val empty: Graph<_,_> (* Helper methods for getting the data from a Vertex *) val vertexId : Vertex<_,_> -> int val vertexData : Vertex<'V,_> -> 'V (* Helper methods for getting the data from an Edge *) val edgeId : EdgeData<_> -> int val edgePriority : EdgeData<_> -> int val edgeTarget : EdgeData<_> -> int val edgeData : EdgeData<'E> -> 'E (* Getting a vertex from a graph by id *) val getVertex : int -> Graph<'V,'E> -> Vertex<'V,'E> (* Getting all edges from a graph by a vertex id *) val getEdges : int -> Graph<'V,'E> -> Adjacency<'E> (* Add a new vertex *) val addVertex : 'V -> Graph<'V,'E> -> int (*new id*) * Graph<'V,'E> (* Add a new edge. Edges include a priority value *) val addEdge : int (*priority*) -> int (*source vertex*) -> int (*target vertex*) -> 'E (*edge data*) -> Graph<'V,'E> -> int (*new id*) * Graph<'V,'E> (* The edges aren't sorted by default so this function sorts them by priority *) val sortEdges : Adjacency<'E> -> Adjacency<'E> (* Removes an edge from a graph by id *) val removeEdge : int -> Graph<'V,'E> -> Graph<'V,'E> (* Removes a vertex from a graph by id and removes any related edges *) val removeVertex : int -> Graph<'V,'E> -> Graph<'V,'E>

**Graph.fs**

#light module Graph (* -- Same description as above -- truncated! *) type VertexData<'V> = int (* identifier *) * 'V (* vertex data *) type EdgeData<'E> = int (* identifier *) * int (* priority *) * int (* vertex target *) * 'E (* edge data *) (* The graph uses adjacency list notation *) type Adjacency<'E> = EdgeData<'E> list (* The Vertex type represents the internal structure of the graph *) type Vertex<'V, 'E> = VertexData<'V> * Adjacency<'E> (* A Graph is a Vertex list. The nextNode allows for consistent addressing of nodes *) type Graph<'V, 'E> = int (* nextNode identifier *) * Vertex<'V, 'E> list (* Empty graph construction *) let empty: Graph<_,_> = (0, []) (* Helper methods for getting the data from a Vertex *) let vertexId (v:Vertex<_,_>) = v |> fst |> fst let vertexData (v:Vertex<_,_>) = v |> fst |> snd (* Helper methods for getting the data from an Edge *) let edgeId ((x,_,_,_):EdgeData<_>) = x let edgePriority ((_,x,_,_):EdgeData<_>) = x let edgeTarget ((_,_,x,_):EdgeData<_>) = x let edgeData ((_,_,_,x):EdgeData<_>) = x (* Getting a vertex from a graph by id *) let getVertex v (g:Graph<_, _>) : Vertex<_,_> = snd g |> List.find (fun V -> vertexId V = v) (* Getting all edges from a graph by a vertex id *) let getEdges v (g:Graph<_, _>) = g |> getVertex v |> snd (* Add a new vertex *) let addVertex (v:'V) (g:Graph<'V, _>) : (int*Graph<'V,_>) = let id = fst g let s = snd g let newVD : VertexData<_> = (id, v) let newA : Adjacency<_> = [] let newV = (newVD, newA) (id, (id + 1, newV::s)) (* Add a new edge. Edges include a priority value *) let addEdge priority (v:int) (v':int) (e:'E) (g:Graph<'V, 'E>) : (int*Graph<'V,'E>) = let id = fst g let s = snd g let newE : EdgeData<_> = (id, priority, v', e) (id, (id + 1, s |> List.map (fun V -> if (vertexId V) = v then (fst V, newE::(snd V)) else V))) (* The edges aren't sorted by default so this function sorts them by priority *) let sortEdges (a:Adjacency<_>) = a |> List.sortBy edgePriority (* Removes an edge from a graph by id *) let removeEdge (id:int) (g:Graph<_,_>) : Graph<_,_> = let next = fst g let s = snd g (next, s |> List.map ( fun (v, a) -> (v, a |> List.filter (fun x -> (edgeId x) <> id)))) (* Removes a vertex from a graph by id and removes any related edges *) let removeVertex (id:int) (g:Graph<_,_>) : Graph<_,_> = let next = fst g let s = snd g (next, s |> ([] |> List.fold (fun s' (v, a) -> if (fst v) = id then s' else let f = fun x -> ((edgeTarget x) <> id) let newA = a |> List.filter f let newV = (v, newA) newV::s')))

**An example**

The module is simple to use. The main trick is to identify the type of the empty graph. This is not necessary if it can be inferred but it is helpful when beginning to create the graph. Here is an example:

#light open Graph Graph.empty |> Graph.addVertex "vertex1" |> (fun (v1,g) -> (Graph.addVertex "vertex2" g) |> (fun (v2,g) -> (Graph.addEdge 0 v1 v2 "edge1" g))) |> snd |> printf "Parsed graph: %A\n\n"

I have used nested lambda functions to bind the returned variables. You might notice that this is very similar to some of the computational expressions that F# can be used for, such as with Async. I might have a look into doing something like the following to hide the intermediate graph values:

let (g,v1,v2,e1) = graph { let! v1 = Graph.addVertex "vertex1" let! v2 = Graph.addVertex "vertex2" let! e1 = Graph.addEdge 0 v1 v2 "edge1" return! (v1,v2,e1) }

Pingback: F#: Graphing with GLEE « Steve Horsfield

Pingback: Rick Minerich's Development Wonderland : Discoveries This Week 08/02/2009