# F#: A Data Structure For Modelling Directional Graphs

In my previous post, I described how to parse a regular expression pattern into an abstract syntax representation.  The next phase is to transform the parsed syntax into a finite state machine.  I have chosen to model the FSM as a mathematical directional graph (digraph).  This post describes the implementation of this module in preparation for the actual compilation step.

You can read the previous article here: F#: Building a Regular Expression Pattern Parser.

Overview

There are a number of ways in which a digraph can be modelled:

• As a set of vertices and a set of edges
• As a set of vertices and a table with positions for every possible edge
• As a set of tuples comprising a vertex and a set of its edges

Each of these options has both advantages and disadvantages.  For example, it is very easy to find paths using the second approach but it requires space that grows at a rate of n^2 where n is the number of vertices.

I have chosen to use the third of these, often called an adjacency list approach.  I have also augmented the data structure with some other information:

• integer identifiers to allow the addressing of a vertex or edge by a number
• integer priorities on edges to allow edges to be sorted according to priority (this will be useful for the FSM)
• arbitrary data to be associated with both edges and vertices

I have omitted several functions that a typical graph representation may support, such as union and intersection.  This is because I do not think they will be needed for the regular expression compilation.  Instead, I have added methods that support atomic changes to the digraph:

• obtaining an empty graph

Edges are only valid between existing vertices but the algorithm does not perform any checking.

The code follows.  Again, I have split the code into a module file with a separate signature:

Graph.fsi

```#light

module Graph

(* In this module, a graph is represented by an adjacency
list.

This module is not intended to be a high performance
implementation.

This module can represent a directed or undirected
graph depending on the method of handling edges.
The default behaviour results in a directed graph.
*)

type VertexData<'V> =
int (* identifier *) *
'V (* vertex data *)

type EdgeData<'E> =
int (* identifier *) *
int (* priority *) *
int (* vertex target *) *
'E (* edge data *)

(* The graph uses adjacency list notation *)

(* The Vertex type represents the internal structure
of the graph *)
type Vertex<'V, 'E> = VertexData<'V> * Adjacency<'E>

(* A Graph is a Vertex list.  The nextNode allows for
type Graph<'V, 'E> =
int (* nextNode identifier *) *
Vertex<'V, 'E> list

(* Empty graph construction *)
val empty: Graph<_,_>

(* Helper methods for getting the data from a Vertex *)
val vertexId : Vertex<_,_> -> int
val vertexData : Vertex<'V,_> -> 'V
(* Helper methods for getting the data from an Edge *)
val edgeId : EdgeData<_> -> int
val edgePriority : EdgeData<_> -> int
val edgeTarget : EdgeData<_> -> int
val edgeData : EdgeData<'E> -> 'E

(* Getting a vertex from a graph by id *)
val getVertex : int -> Graph<'V,'E> -> Vertex<'V,'E>
(* Getting all edges from a graph by a vertex id *)
val getEdges : int -> Graph<'V,'E> -> Adjacency<'E>

(* Add a new vertex *)
val addVertex : 'V -> Graph<'V,'E> ->
int (*new id*) * Graph<'V,'E>

(* Add a new edge.  Edges include a priority value *)
int (*priority*) ->
int (*source vertex*) ->
int (*target vertex*) ->
'E (*edge data*) ->
Graph<'V,'E> ->
int (*new id*) * Graph<'V,'E>

(* The edges aren't sorted by default so this function
sorts them by priority *)

(* Removes an edge from a graph by id *)
val removeEdge : int -> Graph<'V,'E> -> Graph<'V,'E>

(* Removes a vertex from a graph by id and removes
any related edges *)
val removeVertex : int -> Graph<'V,'E> -> Graph<'V,'E>```

Graph.fs

```#light

module Graph
(* -- Same description as above -- truncated! *)

type VertexData<'V> =
int (* identifier *) *
'V (* vertex data *)

type EdgeData<'E> =
int (* identifier *) *
int (* priority *) *
int (* vertex target *) *
'E (* edge data *)

(* The graph uses adjacency list notation *)

(* The Vertex type represents the internal structure
of the graph *)
type Vertex<'V, 'E> = VertexData<'V> * Adjacency<'E>

(* A Graph is a Vertex list.  The nextNode allows for
type Graph<'V, 'E> =
int (* nextNode identifier *) *
Vertex<'V, 'E> list

(* Empty graph construction *)
let empty: Graph<_,_> = (0, [])

(* Helper methods for getting the data from a Vertex *)
let vertexId (v:Vertex<_,_>) = v |> fst |> fst
let vertexData (v:Vertex<_,_>) = v |> fst |> snd
(* Helper methods for getting the data from an Edge *)
let edgeId ((x,_,_,_):EdgeData<_>) = x
let edgePriority ((_,x,_,_):EdgeData<_>) = x
let edgeTarget ((_,_,x,_):EdgeData<_>) = x
let edgeData ((_,_,_,x):EdgeData<_>) = x

(* Getting a vertex from a graph by id *)
let getVertex v (g:Graph<_, _>) : Vertex<_,_> =
snd g |> List.find (fun V -> vertexId V = v)
(* Getting all edges from a graph by a vertex id *)
let getEdges v (g:Graph<_, _>) =
g |> getVertex v |> snd

(* Add a new vertex *)
: (int*Graph<'V,_>) =
let id = fst g
let s = snd g
let newVD : VertexData<_> = (id, v)
let newA : Adjacency<_> = []
let newV = (newVD, newA)
(id, (id + 1, newV::s))

(* Add a new edge.  Edges include a priority value *)
(v:int) (v':int) (e:'E) (g:Graph<'V, 'E>)
: (int*Graph<'V,'E>) =
let id = fst g
let s = snd g
let newE : EdgeData<_> = (id, priority, v', e)
(id,
(id + 1,
s |> List.map (fun V ->
if (vertexId V) = v then
(fst V, newE::(snd V))
else V)))

(* The edges aren't sorted by default so this function
sorts them by priority *)
a |> List.sortBy edgePriority

(* Removes an edge from a graph by id *)
let removeEdge (id:int) (g:Graph<_,_>)
: Graph<_,_> =
let next = fst g
let s = snd g
(next, s |> List.map ( fun (v, a) ->
(v, a |>
List.filter (fun x -> (edgeId x) <> id))))

(* Removes a vertex from a graph by id and removes
any related edges *)
let removeVertex (id:int) (g:Graph<_,_>)
: Graph<_,_> =
let next = fst g
let s = snd g
(next, s |> ([] |> List.fold (fun s' (v, a) ->
if (fst v) = id then s'
else
let f = fun x -> ((edgeTarget x) <> id)
let newA = a |> List.filter f
let newV = (v, newA)
newV::s')))```

An example

The module is simple to use. The main trick is to identify the type of the empty graph. This is not necessary if it can be inferred but it is helpful when beginning to create the graph. Here is an example:

```#light
open Graph

Graph.empty
|> (fun (v1,g) ->
|> (fun (v2,g) ->
(Graph.addEdge 0 v1 v2 "edge1" g)))
|> snd
|> printf "Parsed graph: %A\n\n"```

I have used nested lambda functions to bind the returned variables. You might notice that this is very similar to some of the computational expressions that F# can be used for, such as with Async. I might have a look into doing something like the following to hide the intermediate graph values:

```let (g,v1,v2,e1) = graph {