F#: Nested Data Structures, Enumeration and Sequence Comprehensions

I have been working on a business data model for my WPF demo application (which I hope to begin describing soon). One of my requirements for this was a nested data structure to model a namespace hierarchy. Doing this in an imperative way was easy but my goal was to use immutable data structures to support a functional-programming approach to the data. Eventually, I settled on a nested Map structure with an F# record data type to represent the underlying data. Structurally, this was similar to how I would have approached it from an imperative perspective, but this was after several other attempts using custom data structures (abstract syntax trees and other approaches) had proven too inflexible to the demands of the application.

A second requirement for this data structure was abstraction. I did not want the internal data structure to be exposed to client code. Therefore, I needed to provide custom enumeration support. This post describes two approaches to this.

Understanding enumeration in F# (and .NET)

Enumeration in F# is modeled by the seq generic type. This is actually a type alias for IEnumerable<‘T> in the System.Collections.Generic namespace. I will assume the namespaces for these interfaces and classes are already known by the reader in the following text, although they are referred to explicitly in the code later in this post. This interface provides a single method

GetEnumerator: unit -> IEnumerator<‘T>

Through inheritance, IEnumerable<‘T> also requires implementations to support the non-generic equivalent IEnumerable that provides an alternative:

GetEnumerator: unit -> IEnumerator

There are two important differences between IEnumerator<‘T> and IEnumerator. The former provides typed data access and also inherits the IDisposable interface. The latter does not.

Enumeration is achieved by using the following method and property:

MoveNext: unit -> bool
Current: ‘T

An example nested data structure

The following data structure will be used to demonstrate the principles described in this post:

#light

type TKey1 = int list
type TKey2 = int
type TValue = string

// Do not want to open namespaces to avoid introducing unwanted CLR types into scope
type KVP<'K,'V> = System.Collections.Generic.KeyValuePair<'K,'V>
type IEnumerable = System.Collections.IEnumerable
type IEnumerable<'T> = System.Collections.Generic.IEnumerable<'T>
type IEnumerator = System.Collections.IEnumerator
type IEnumerator<'T> = System.Collections.Generic.IEnumerator<'T>

type CustomDataStructure =
  { model: Map<TKey1, Map<TKey2, TValue>> }

  static member Empty with get() = { model = Map.Empty }

Next, an instance of this data structure can be constructed using the following code:

let m =
  let data =
       Map.Empty
    |> Map.add [1;2] (   Map.Empty
                      |> Map.add 1 "value ([1;2]->1)"
                      |> Map.add 2 "value ([1;2]->2)"
                     )
    |> Map.add [1;3] (   Map.Empty
                      |> Map.add 4 "value ([1;3]->4)"
                      |> Map.add 5 "value ([1;3]->5)"
                     )
    |> Map.add [2]   (   Map.Empty
                      |> Map.add 7 "value ([2]->7)"
                      |> Map.add 8 "value ([2]->8)"
                     )
  {model = data}

Challenge: Enumerate over the contained elements only

The requirement is to provide an IEnumerable<TValue> implementation.

Approach 1: Implementing the interfaces

In F#, it is possibe to augment types (excluding type abbreviations) with interface declarations.  The challenge with this first approach is how to create an appropriate IEnumerator<TValue> object.  The state of the object must be mutable because it must selectively iterate from each of the contained Map instances.  (An alternative would be to pre-create a new sequence containing all the elements of the desired sequence but the second approach below gives a better but similar solution).

The enumerator object can be constructed using an object expression, similar to an anonymous type expression in C#, as follows:

{ new IEnumerator<TValue> with
    member e.Current with get() = failwith "invalid operation"
  interface System.IDisposable with
    member e.Dispose () = failwith "invalid operation"
  interface IEnumerator with
    member e.MoveNext () = failwith "invalid operation"
    member e.Current with get() = failwith "invalid operation"
    member e.Reset () = failwith "invalid operation"
}

Then, the method code just needs to be written :)

Wrapping mutable state is not easy in an anonymous type.  One option is to create a named type and provide mutable state as part of that concrete enumerator type.  The alternative is to use a mutable declaration.  F# provides two such declaration mechanisms: let mutable and ref.  If you try to use let mutable to solve this problem you will get the compiler error:

error FS0191: The mutable variable ‘x’ is used in an invalid way. Mutable variables may not be captured by closures. Consider eliminating this use of mutation or using a heap-allocated mutable reference cell via ‘ref’ and ‘!’

Clearly, using ref is necessary.

Here is an implementation using ref:

type CustomDataStructure =
  { model: Map<TKey1, Map<TKey2, TValue>> }

  static member Empty with get() = { model = Map.Empty }

  interface IEnumerable<TValue> with
    member s.GetEnumerator () =
      // outerEnum tracks the outer enumeration
      let outerEnum =
        (s.model
         :>IEnumerable<KVP<TKey1, Map<TKey2,TValue>>>).GetEnumerator()
      // innerEnum tracks the inner enumeration and is mutable
      let innerEnum : ref<IEnumerator<KVP<TKey2,TValue>>> = ref null

      { new IEnumerator<TValue> with
          member e.Current
            with get() =
              (!innerEnum).Current.Value

        interface System.IDisposable with
          member e.Dispose () =
            if (innerEnum.Value <> null) then
              (!innerEnum).Dispose()
              innerEnum := null
            outerEnum.Dispose()

        interface IEnumerator with
          member e.Reset () =
            if (innerEnum.Value <> null) then
              (!innerEnum).Dispose()
            outerEnum.Reset()

          member e.Current
            with get() =
              (((e:?>IEnumerator<TValue>).Current):>obj)

          member e.MoveNext () =
            let rec move() =
              if (innerEnum.Value <> null) then
                if ((!innerEnum).MoveNext() = true) then
                  true
                else
                  (!innerEnum).Dispose()
                  innerEnum := null
                  findNext()
              else
                findNext()
            and findNext() =
              let r = outerEnum.MoveNext()
              if r = true then
                innerEnum := ((outerEnum.Current.Value):>IEnumerable<_>).GetEnumerator()
                move ()
              else
                false
            move()
      }

  interface IEnumerable with
    member s.GetEnumerator () =
      (s:>IEnumerable<TValue>).GetEnumerator()
      :> IEnumerator

That is a lot of code! Worse, it has a very imperative style. However, it does work and demonstrates how to provide a custom implementation when the second approach (below) is not easily applied. You can use the following code to test the enumeration of the data structure:

m |> Seq.iter (fun x -> x |> System.Console.WriteLine)

Compare that to:

m.model |> Seq.iter (fun x -> x |> System.Console.WriteLine)

Approach 2: sequence comprehensions

This second approach uses an F# construct called a sequence comprehension. It is similar to implementing a custom enumerator in C# with the yield keyword.

First, a method that can convert the custom data type into a sequence (seq) is required.  Then, interface implementations that provide this generated sequence allow the custom data type to be directly enumerated.

Here is the code for the sequence comprehension:

member s.to_seq() =
  seq {
        for map1pair in s.model do
          for map2pair in map1pair.Value do
            yield map2pair.Value
      }

That is pretty straightforward and F# type-inference yields an expression of type seq<TValue> automatically.

Then, the implementation of the interface methods is:

interface IEnumerable<TValue> with
  member s.GetEnumerator () =
    s.to_seq().GetEnumerator()

interface IEnumerable with
  member s.GetEnumerator () =
    (s.to_seq().GetEnumerator())
    :> IEnumerator

Note that in this approach there is no need to implement IEnumerator<TValue> and there is no need to manage resource disposal (using IDisposable) or mutable state. The full implementation using this approach is below:

type CustomDataStructure =
  { model: Map<TKey1, Map<TKey2, TValue>> }

  static member Empty with get() = { model = Map.Empty }  

  member s.to_seq() =
    seq {
          for map1pair in s.model do
            for map2pair in map1pair.Value do
              yield map2pair.Value
        }

  interface IEnumerable<TValue> with
    member s.GetEnumerator () =
      s.to_seq().GetEnumerator()

  interface IEnumerable with
    member s.GetEnumerator () =
      (s.to_seq().GetEnumerator())
      :> IEnumerator

Conclusions

So, sequence comprehensions are the easiest way to produce custom enumeration over a data structure. If you find yourself needing to wrap mutable state and you do not want to use another type definition then ref can help you. Sometimes, though, you may find there is a better approach available.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s