F#: A Complete Regular Expression Processor

This post completes this mini-series on regular expression processing in F#.  You can find the complete source code here as well.  It has been a good practice for me and I hope you have found the series useful.

Previous posts are here:

The process so far

In the previous posts, I demonstrated how to convert a regular expression pattern string into an abstract syntax representation and then in the previous post I showed how to compile that representation into a non-deterministic finite automata (or state machine) represented by a directional graph.  I finished the last post with a working expression compiler and a visualisation of the graph using Microsoft GLEE.

Now, in this post, I will connect the real world to the state machine and demonstrate a working regular expression engine (but please let me know if you find bugs!).

Processing in outline

The inputs to the processor are:

  • the previously compiled NDFA graph
  • a sequence of characters (char seq)

There are two forms of output:

  • an indication of whether a match was found
  • the actual match details including the text

These two outputs correspond to the test and firstMatch functions in the code.  The process is almost entirely identical for these two cases and so these functions use the execute function to actually perform the processing.

The process is as follows:

  1. Get the list of starting states
  2. Follow any free transitions to find additional starting states
  3. Check whether any of these states are possible matches
  4. Process an input token
  5. Follow any free transitions in the resulting states
  6. Check whether any of these states are possible matches
  7. Decide whether further input needs to be processed, and repeat steps 4 to 7 if so
  8. Return the best match

Steps 4 to 7 are wrapped by the function processInput and the contained tail-recursive function processInputInner.  (I know it is tail recursive in part because I have tested it with over a million characters in the input).  You can use the Visual Studio debugger to step through the code or add printf statements if you want to see what goes on in each step.  That’s much easier than me attempting to explain all the details!  Feel free to ask questions though.

One important point, however, is the use of the FlowState<char> type.  This is used to ensure that the input is processed exactly once.  I have posted a blog entry related to this issue here: F#: Sequence Comprehensions and Iteration.

The code

You can download the entire source code here: Regular Expressions in FSharp 1.0.

I have annotated the code so it should be fairly easy to follow.  It is even easier if you can put in in Visual Studio and get syntax highlighting.  (I have not yet found a suitable way of doing that on WordPress.com).  There is one new module, RegExProcessor, and changes to Program.fs:

RegExProcessor.fsi

#light

open Graph
open RegExParsing
open RegExCompiling

module RegExProcessor =

  (* used to store captured text *)
  type CharList = System.Collections.Generic.LinkedList<char>

  (* used to represent the result of findFirst *)
  type RegExMatchResult =
    { matchState: int;
      matchPositions: (int*int) list
      buffer: CharList;
      bufferStart: int }

  (* tests whether the input sequence can match the NDFA *)
  val test :
    RegExCompiling.NdfaGraph ->
    char seq ->
    bool
  (* finds the first match and submatch data *)
  val firstMatch :
    RegExCompiling.NdfaGraph ->
    char seq ->
    RegExMatchResult option

  (* helper for building the ndfa directly *)
  val parseAndCompile :
    (RegExCompiling.RegExMatchMode ->
      string ->
      RegExCompiling.NdfaGraph)

RegExProcessor.fs

#light

open Graph
open RegExParsing
open RegExCompiling

module RegExProcessor =

  (* -------------------
     TYPES
     ------------------- *)

  (* used to store captured text *)
  type CharList = System.Collections.Generic.LinkedList<char>

  (* used to represent the result of findFirst *)
  type RegExMatchResult =
    { matchState: int;
      matchPositions: (int*int) list
      buffer: CharList;
      bufferStart: int }

  (* represents a token in the input stream *)
  type RegExInput =
    | StartOfInput
    | EndOfInput of int (* index of last character *)
    | Char of char*int (* index of character *)

  (* represents a single possible state of the NDFA *)
  type RegExStateData =
    int (* state *) *
    int option (* index of start *) *
    int option (* index of end *) *
    int list (* submatch starts *) *
    int list (* submatch ends *)

  (* -------------------
     GENERAL HELPERS
     ------------------- *)

  (* syntactic sugar for option types
     equivalent to the C# ?? coalescing
     operator *)
  let (=??) (x:'a option) (y:'a) =
    match x with
    | Some x -> x
    | _ -> y

  (* position identification from a RegExInput *)
  let inline currentIndex c =
      match c with
      | StartOfInput -> 0
      | EndOfInput i -> i
      | Char (_, i) -> i

  (* successor position identification from a RegExInput *)
  let inline nextIndex c =
      match c with
      | StartOfInput -> 0
      | EndOfInput i -> i
      | Char (_, i) -> i + 1

  (* prior position identification from a RegExInput *)
  let inline previousIndex c =
      match c with
      | Char (_, i) when i > 0 -> i - 1
      | EndOfInput i -> i - 1
      | _ -> 0

  (* start of a state *)
  let inline startOfState currentPos s =
    match s with
    | (_,Some x,_,_,_) -> x
    | _ -> currentPos

  (* start of a state option *)
  let inline startOfStateWrapped currentPos s =
    match s with
    | Some (_,Some x,_,_,_) -> x
    | _ -> currentPos

 
  (* -------------------
     SEQUENCE FLOWS
     ------------------- *)

  (* convert the input sequence into a
     RegExToken sequence *)
  let sequenceInput (i:char seq) =
    seq { yield StartOfInput
          let n = 0
          let rn = ref n
          for x in i do
            yield (Char (x, !rn))
            rn := !rn + 1
          yield EndOfInput (!rn - 1)
        }

  (* FlowState<'T> wraps IEnumerator<'T>
     Note that this implementation does not dispose
     the IEnumerator and should not be used for
     production code *)
  type FlowState<'T> =
    { iter: System.Collections.Generic.IEnumerator<'T> ;
      isStart: bool ;
      isEnd: bool;
      count: int }

  (* get a new FlowState<'T> *)
  let seqToFlow (s:'T seq) =
    let enumerator = s.GetEnumerator()
    { iter = enumerator; isStart = true; isEnd = false; count = 0 }

  (* get the next input token from the flow *)
  let inline flowChar (s:FlowState<char>)
      : RegExInput * FlowState<char> =
    if (s.isStart) then
      (StartOfInput,
       {iter = s.iter;
        isStart = false;
        isEnd = false;
        count = 0 })
    elif (s.isEnd) then
      (EndOfInput s.count,
       {iter = null;
        isStart = false;
        isEnd = true;
        count = s.count })
    else
      let hasNext = s.iter.MoveNext()
      if hasNext <> true then
        (EndOfInput s.count,
         {iter = s.iter;
          isStart = false;
          isEnd = true;
          count = s.count })
      else
        (Char (s.iter.Current, s.count),
         {iter = s.iter;
         isStart = false;
         isEnd = false;
         count = s.count + 1 })

  (* test whether the flow is empty *)
  let inline flowEmpty (s:FlowState<'T>) =
    s.isEnd

 
  (* -------------------
     STATE TRANSITIONS
     ------------------- *)

  (* after a step, there may be multiple ways to
     be in a particular state.  This function
     decides which of those is preferred,
     considering where the match began and
     the length of match *)
  let preferredStates ss : RegExStateData list =
    let initialstate = ([], None, None, None, None)
    let statefunc
        (result:RegExStateData list,
         id,
         start,
         endpos,
         y:RegExStateData option)
        (x:RegExStateData) =
      let (xid, xstart, xendpos, _, _) = x
      match id with
      | None ->
        (result, Some xid, xstart, xendpos, Some x)
      | Some idval when idval <> xid ->
        ((Option.get y)::result,
         Some xid, xstart, xendpos, Some x)
      | _ ->
        let s = start =?? 0
        let s' = xstart =?? 0
        let e = endpos =?? System.Int32.MaxValue
        let e' = xendpos =?? System.Int32.MaxValue
        if s < s' then
          (result, id, start, endpos, y)
        elif s > s' then
          (result, Some xid, xstart, xendpos, Some x)
        elif e <= e' then
          (result, id, start, endpos, y)
        else
          (result, Some xid, xstart, xendpos, Some x)
    let statecompletion (result, _, _, _, y)
        : RegExStateData list =
      match y with
      | Some yval -> yval::result
      | _ -> result
    ss
      |> List.sortBy
          (fun ((id,_,_,_,_):RegExStateData) -> id)
      |> List.fold statefunc initialstate
      |> statecompletion

  (* this function takes a pair of start and end lists
     for submatches and appends new entries if the
     vertex identified by id is a subpattern marker.
     Using previous and next allows proper handling of
     the start and end of input respectively *)
  let mapSubMatches
        (substarts, subends)
        (id:int)
        (previous:int)
        (next:int)
        (ndfa:NdfaGraph) =
    let targetNode = ndfa |> Graph.getVertex id |> Graph.vertexData
    let substarts' =
      match targetNode |> snd with
      | Some StartSubPattern -> next::substarts
      | _ -> substarts
    let subends' =
      match targetNode |> snd with
      | Some EndSubPattern -> previous::subends
      | _ -> subends
    (substarts',subends')

 
  (* free transitions are state transitions that may
     be optional or are automatic.  They do not depend
     on input tokens *)
  let freeTransitions
        (ndfa:RegExCompiling.NdfaGraph)
        (previous:int)
        (next:int)
        (x:RegExStateData) =
    let rec transition
          (newState:bool)
          (targetStates:RegExStateData list)
          ((id, start, stop, substarts, subends):RegExStateData)
          (es:Adjacency<NdfaEdge>) =
      match es with
      | [] when newState ->
        Some targetStates
      | [] ->
        None
      | e::es ->
        let target = edgeTarget e
        let (substarts',subends') =
          ndfa
            |> mapSubMatches
                (substarts,subends)
                target
                previous
                next
        match edgeData e with
        | Auto ->
          transition
            true
            ((target,start,stop,substarts',subends')::targetStates)
            (id,start,stop,substarts,subends)
            es
        | NonGreedyAnyCharacter ->
          transition
            true
            ((target,start,stop,substarts',subends')::targetStates)
            (id,Some next,stop,substarts,subends)
            es
        | _ ->
          transition
            newState
            targetStates
            (id,start,stop,substarts,subends)
            es
    let rec transitionAll
          accummulator
          ((id,start,stop,substarts,subends):RegExStateData) =
      let x = (id,start,stop,substarts,subends)
      let es = ndfa |> Graph.getEdges id
      let newStates = transition false [] x es
      match newStates with
      | None ->
          x::accummulator
      | Some ns ->
          ns |> List.collect
              (fun (x:RegExStateData) ->
                transitionAll (x::accummulator) x)
    (transitionAll [x] x) |> preferredStates

  (* test an individual character against a test pattern *)
  let testCharacter (criteria:RegExpCharacterRangeUnit) c =
    let test (criteria:RegExpCharacterRange) : bool =
      match criteria with
        | Character v -> v = c
        | Range (v1,v2) ->
          (v1 <= c) && (v2 >= c)
    match criteria with
    | Matching criteria -> criteria |> List.exists test
    | Inversion criteria -> (criteria |> List.exists test) <> true

 
  (* attempt to transition from one state to another using the
     specified edge *)
  let step1edge
        ndfa
        bestMatch
        ((id, start, stop, substarts, subends):RegExStateData)
        (current:int)
        (previous:int)
        (next:int)
        c
        e =
    let et:NdfaEdge = Graph.edgeData e
    let tgt:int = Graph.edgeTarget e
    let (substarts',subends') =
      ndfa |> mapSubMatches (substarts,subends) tgt previous next
    let nt:NdfaNode =
      ndfa |> Graph.getVertex tgt |> Graph.vertexData
    let normalResult =
      [(tgt, start, Some current, substarts', subends')]
    let sourceResult =
      [(id, start, stop, substarts, subends)]
    let nonGreedyResult =
      [(tgt, Some (current+1), None, substarts', subends');
       (id, Some (current+1), None, substarts, subends)]

    match c with
    | StartOfInput ->
      match et with
      | StartAssertion -> normalResult
      | _ -> sourceResult
    | EndOfInput _ ->
      match et with
      | EndAssertion -> normalResult
      | _ -> sourceResult
    | Char (c,_) ->
      match et with
      | NonGreedyAnyCharacter
        when bestMatch = None ->
            nonGreedyResult
      | Simple c'
        when c = c' ->
            normalResult
      | AnyChar -> normalResult
      | CharacterTest criteria
        when (testCharacter criteria c) ->
            normalResult
      | _ -> []    

  (* transition one state into a list of next states *)
  let step1state
      ndfa
      bestMatch
      current
      previous
      next
      c
      (x:RegExStateData) =
    let (id,_,_,_,_) = x
    let es:Adjacency<NdfaEdge> =
      ndfa |> Graph.getEdges id
    let mappedStates =
      es
        |> List.collect
            (step1edge
              ndfa
              bestMatch
              x
              current
              previous
              next
              c)
    match c with
    | EndOfInput _ ->
      (* can keep original state *)
      x::mappedStates
    | _ -> mappedStates

 
  (* -------------------
     STATE INITIALISATION
     ------------------- *)

  (* get an array with an entry for each state and
     a value indicating whether the state is a closure
     for the NDFA *)
  let getClosureMap (ndfa:RegExCompiling.NdfaGraph) =
    let maxVertex = (ndfa |> fst |> fst) - 1
    let testClosure ndfa id =
      match Graph.tryGetVertex id ndfa with
      | Some v -> (v |> Graph.vertexData |> fst) = Closure
      | None -> false
    let isClosureMap =
      [| for x in 0..maxVertex -> testClosure ndfa x |]
    isClosureMap

  (* create a RegExStateData for the provided initial state *)
  let inline initialiseState v : RegExStateData =
    (v |> Graph.vertexId, Some 0, None, [], [])

  (* build the set of possible starting states *)
  let getStartStates (ndfa:RegExCompiling.NdfaGraph) =
    ndfa
      |> snd
      |> List.choose
        (fun v ->
          let nodeType = v |> Graph.vertexData |> fst
          if nodeType = RegExCompiling.Start then
            Some (v |> initialiseState)
          else
            None)
       |> List.collect
        (fun x -> freeTransitions ndfa 0 0 x)

  (* -------------------
     MATCH SELECTION
     ------------------- *)

  (* given a previous best match and a set of states
     find the best match.  If the previous match
     wins then return None, if there is no closure
     in the new states then return None, otherwise
     return Some newBestMatch *)
  let findBestMatch
      ndfa
      (closureMap:bool array)
      (bestMatch:RegExStateData option)
      (states:RegExStateData list)
        : RegExStateData option =
    let closureTest tgt = Array.get closureMap tgt
    let bestMatchTest
        (x:RegExStateData option)
        (y:RegExStateData) =
      let (id,start,stop,_,_) = y
      let yValid = closureTest id
      if yValid then
        match x with
        | None -> Some y
        | Some (_,start',stop',_,_) ->
          (* Check starts *)
          if (start' < start) then
            x
          elif (start < start') then
            Some y
          (* Also check length *)
          elif (Option.get stop) > (Option.get stop') then
            Some y
          (* Otherwise, prefer earliest match *)
          else
            x
      else
        x
    let bestMatch' =
      states |> List.fold bestMatchTest None
    match (bestMatch, bestMatch') with
    | None, _ ->
        bestMatch'
    | _, None ->
        None (* previous match wins *)
    | Some (id, start,stop,_,_),
      Some (id', start',stop', _, _) ->
        if (start < start') then
          None (* finish with previous match *)
        elif stop > stop' then
          bestMatch
        elif stop' > stop then
          bestMatch'
        else
          bestMatch'

 
  (* -------------------
     CHARACTER CAPTURE
     ------------------- *)

  (* Character capture methods:
      captureNull and captureNullInit allow for efficient
      testing of matches without returning the matched string
      captureList and captureListInit use a doubly-linked list of
      characters to capture the text *)

  let inline captureNullInit () = ()

  let inline captureNull
      (bestMatch:RegExStateData option)
      (bestMatch':RegExStateData option)
      states
      c
      capture
      captureOffset =
    (capture, 0)

  let inline captureListInit () = new CharList()

  let inline captureList
      (bestMatch:RegExStateData option)
      (bestMatch':RegExStateData option)
      states
      c
      (capture:CharList)
      captureOffset =
    let currentPos = currentIndex c
    let start1 = startOfStateWrapped currentPos bestMatch
    let start2 = startOfStateWrapped currentPos bestMatch'
    let starts =
      List.append
        [start1;start2]
        (states |> List.map (startOfState currentPos))
    let minOffset = starts |> List.min
    let trimLength = minOffset - captureOffset
    match c with
    | Char (x,_) ->
      capture.AddLast x |> ignore
    | _ -> ()
    let rec reduce n =
      if n > 0 then
        capture.RemoveFirst()
        reduce (n-1)
      else
        ()
    reduce trimLength
    (capture, minOffset)    

 
  (* -------------------
     PROCESS
     ------------------- *)

  (* process a single input token *)
  let step1
      ndfa
      closureMap
      bestMatch
      (states:RegExStateData list)
      (c:RegExInput) =
    let current = currentIndex c
    let next = nextIndex c
    let previous = previousIndex c
    let states' =
      states
        |> List.collect
          (step1state
            ndfa
            bestMatch
            current
            previous
            next
            c)
    let states'' =
      states'
        |> List.collect
          (fun (x:RegExStateData) ->
            freeTransitions
              ndfa
              previous
              next
              x)
        |> preferredStates
    let bestMatch' =
      states''
        |> findBestMatch
          ndfa
          closureMap
          bestMatch
    (bestMatch', states'')

 
  (* find a match *)
  let processInput
        captureMethod
        ndfa
        closureMap
        bestMatch
        capture
        captureOffset
        states
        (input:FlowState<char>) =
    (* processInner wraps the logic without repeated passing of
       captureMethod, ndfa and closureMap *)
    let rec processInner
          bestMatch
          capture
          captureOffset
          states
          (input:FlowState<char>) =
      match states with
      | [] ->
        (bestMatch, capture, captureOffset)
      | _ when (flowEmpty input) ->
        (bestMatch, capture, captureOffset)
      | _ ->
        let (c, input') = flowChar input
        (* remove states that start after an existing match *)
        let states' =
          if (Option.isSome bestMatch) then
            let currentPos = currentIndex c
            let minPos =
              startOfStateWrapped currentPos bestMatch
            states
              |> List.choose
                (fun (x:RegExStateData) ->
                  if (startOfState currentPos x) > minPos then
                    None
                  else
                    Some x)
          else
            states
        (* move forward a step *)
        let (bestMatch2, states2) =
          c |> step1
                ndfa
                closureMap
                bestMatch
                states'
        (* store the characters if necessary *)
        let (capture2, offset2) =
          captureMethod
            bestMatch
            bestMatch2
            states2
            c
            capture
            captureOffset
        match (bestMatch, bestMatch2) with
        | None, _ ->
          (* might get a better match so carry on *)
          processInner
            bestMatch2
            capture2
            offset2
            states2
            input'
        | Some _, None when (List.isEmpty states2) ->
           (* no better match available *)
           (bestMatch, capture2, offset2)
        | Some _, None ->
          (* still might get a better match with more
             characters *)
          processInner
            bestMatch
            capture2
            offset2
            states2
            input'
        | _ ->
          (* need to carry on even if already got a match
             because a longer match may follow, but
             prefer the most recent match, bestMatch2 *)
          processInner
            bestMatch2
            capture2
            offset2
            states2
            input'
    processInner bestMatch capture captureOffset states input

 
  (* Starts the matching process *)
  let execute captureMethod initCaptureMethod ndfa cs =
    let input =
      seqToFlow cs
    let closureMap =
      getClosureMap ndfa
    let initialStates =
      getStartStates ndfa
    let bestMatch =
      findBestMatch ndfa closureMap None initialStates
    let initialCapture =
      initCaptureMethod ()
    processInput
      captureMethod
      ndfa
      closureMap
      bestMatch
      initialCapture
      0
      initialStates
      input

  (* -------------------
     ENTRY POINTS
     ------------------- *)

  (* tests whether the input sequence can match the NDFA *)
  let test (ndfa:RegExCompiling.NdfaGraph) (cs:char seq) : bool =
    let (bestMatch, chars, offset) =
      execute captureNull captureNullInit ndfa cs
    match bestMatch with
    | None -> false
    | _ -> true

  (* finds the first match and submatch data *)
  let firstMatch (ndfa:RegExCompiling.NdfaGraph) (cs:char seq) =
    let (bestMatch, chars, offset) =
      execute captureList captureListInit ndfa cs
    match bestMatch with
    | None -> None
    | Some (state, startpos, endpos, substarts, subends) ->
      let startpos' = startpos =?? 0
      let endpos' = endpos =?? System.Int32.MaxValue
      let matches =
        (* starts are added in the order encountered
           ends are added in reverse order *)
        (startpos', endpos')::
          (List.zip
              (substarts |> List.rev)
              subends)
      Some
        { matchState = state;
          matchPositions = matches;
          buffer = chars;
          bufferStart = offset }

  (* helper for building the ndfa directly *)
  let parseAndCompile = RegExCompiling.parseAndCompile

 

Program.fs

#light

open Graph
open RegExParsing
open RegExCompiling
open RegExProcessor
open GleeGraph
open Microsoft.Glee.Drawing

(* time a function *)
let time (f:unit->'T) =
  let timeZero = System.DateTime.Now
  let result = f () ;
  (System.DateTime.Now.Subtract(timeZero).TotalMilliseconds, result)

let nodeAdapter (v:Vertex<NdfaNode, NdfaEdge>) (n:Microsoft.Glee.Drawing.Node) =
  let (nt:NdfaNodeType, sp:NdfaNodeSubPatternData option) = vertexData v
  n.Attr.Shape <-
    match nt with
    | Start -> Shape.Point
    | Closure -> Shape.DoubleCircle
    | Normal -> Shape.Circle
  n.Attr.Color <-
    match sp with
    | None -> Color.Black
    | Some StartSubPattern -> Color.Blue
    | Some EndSubPattern -> Color.Red

let edgeAdapter
    ((id,priority,target,data):EdgeData<NdfaEdge>)
    (e:Microsoft.Glee.Drawing.Edge) =
  match data with
  | Auto ->
    e.Attr.Color <- Color.Blue
  | Simple c ->
    e.Attr.Label <- c.ToString()
    e.Attr.Fontcolor <- Color.Red
    e.Attr.Color <- Color.Black
  | CharacterTest t ->
    e.Attr.Label <- (sprintf "$test %20A" t)
    e.Attr.Fontcolor <- Color.Blue
    e.Attr.Color <- Color.Black
  | AnyChar ->
    e.Attr.Label <- "$any"
    e.Attr.Fontcolor <- Color.Blue
    e.Attr.Color <- Color.Black
  | StartAssertion ->
    e.Attr.Label <- "$start"
    e.Attr.Fontcolor <- Color.Red
    e.Attr.Color <- Color.Red
  | EndAssertion ->
    e.Attr.Label <- "$end"
    e.Attr.Fontcolor <- Color.Red
    e.Attr.Color <- Color.Red
  | NonGreedyAnyCharacter ->
    e.Attr.Label <- "$any*[non-greedy]"
    e.Attr.Fontcolor <- Color.Blue
    e.Attr.Color <- Color.Blue

 
(* displays the main application form *)
let run () =
  let frm = new System.Windows.Forms.Form()
  frm.Text <- "RegEx Compiler"
  let panel = new System.Windows.Forms.FlowLayoutPanel()
  panel.Size <- new System.Drawing.Size(100, 60)
  panel.Dock <- System.Windows.Forms.DockStyle.Top
  let txtBoxRegex = new System.Windows.Forms.TextBox()
  txtBoxRegex.Text <- ""
  let chkboxFullMatch = new System.Windows.Forms.CheckBox()
  chkboxFullMatch.Text <- "Initial matches"
  chkboxFullMatch.Checked <- true
  let btnGo = new System.Windows.Forms.Button()
  btnGo.Text <- "Generate"
  let txtBoxText = new System.Windows.Forms.TextBox()
  txtBoxText.Text <- ""
  let btnTest = new System.Windows.Forms.Button()
  btnTest.Text <- "Test"
  let label l =
    let c = new System.Windows.Forms.Label()
    c.Text <- l
    c :> System.Windows.Forms.Control
  panel.Controls.AddRange
    [| (label "regular expression") ;
       (txtBoxRegex :> System.Windows.Forms.Control) ;
       (chkboxFullMatch:>System.Windows.Forms.Control) ;
       (btnGo :>System.Windows.Forms.Control) ;
       (label "test text") ;
       (txtBoxText :>System.Windows.Forms.Control) ;
       (btnTest :>System.Windows.Forms.Control) |]
  panel.SetFlowBreak (btnGo, true)
  let gc = new Microsoft.Glee.GraphViewerGdi.GViewer()
  gc.Graph <- new Microsoft.Glee.Drawing.Graph("empty")
  gc.Graph.GraphAttr.Orientation <- Microsoft.Glee.Drawing.Orientation.Landscape
  gc.Dock <- System.Windows.Forms.DockStyle.Fill;
  frm.Controls.Add gc
  frm.Controls.Add panel
  frm.Size <- new System.Drawing.Size(600, 600)
  let getSyntax () =
    let syntax = txtBoxRegex.Text |> RegExParsing.parseRegExp
    let mode =
      if (chkboxFullMatch.Checked) then FullMatch else FirstMatch
    let g = syntax |> RegExCompiling.compile mode
    printf "Pattern: %A\n\n" txtBoxRegex.Text
    printf "Syntax: %A\n\n" syntax
    printf "Graph: %A\n\n" g
    g
  let applyGleeGraph () =
    let g = getSyntax()
    gc.Graph <- (Graph.toGlee txtBoxRegex.Text nodeAdapter edgeAdapter g)
    g
  let applyGraph () =
    let g = applyGleeGraph ()
    printf "-------------------------------------------------------------\n\n"
  let testMatch () =
    let g = applyGleeGraph ()
    let txt = txtBoxText.Text
    printf "Test text: %A\n\n" txt
    let fn () = RegExProcessor.firstMatch g txt
    let timeResult = time fn
    printf "Time: %Ams\n" (fst timeResult)
    printf "Test match: %A\n\n" (snd timeResult)
    printf "-------------------------------------------------------------\n\n"
  btnGo.Click.Add (fun _ -> applyGraph () )
  btnTest.Click.Add (fun _ -> testMatch () )
  applyGraph ()
  System.Windows.Forms.Application.Run frm

run ()

(* Test the processing of a million characters *)
let runTest () =
  let pattern = "a+b|a"
  let chars = Seq.append (Seq.init 1000000 (fun a -> 'a')) (Seq.singleton 'b')
  let result () =
      chars |> (
          pattern
            |> RegExCompiling.parseAndCompile RegExMatchMode.FirstMatch
            |> RegExProcessor.test )
  let resultVal = time result
  printf "time: %Ams ; result: %A\n\n" (resultVal |> fst) (resultVal |> snd)

(* uncomment the following line to run the test.
   usual runtime is around 20 seconds *)
(* runTest () *)
Advertisements

2 thoughts on “F#: A Complete Regular Expression Processor

  1. Pingback: Download the Regular Expression Processor « Steve Horsfield

  2. Pingback: Rick Minerich's Development Wonderland : F# Discoveries This Week 08/09/2009

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s