I have been working on a business data model for my WPF demo application (which I hope to begin describing soon). One of my requirements for this was a nested data structure to model a namespace hierarchy. Doing this in an imperative way was easy but my goal was to use immutable data structures to support a functional-programming approach to the data. Eventually, I settled on a nested Map structure with an F# record data type to represent the underlying data. Structurally, this was similar to how I would have approached it from an imperative perspective, but this was after several other attempts using custom data structures (abstract syntax trees and other approaches) had proven too inflexible to the demands of the application.
A second requirement for this data structure was abstraction. I did not want the internal data structure to be exposed to client code. Therefore, I needed to provide custom enumeration support. This post describes two approaches to this.
Understanding enumeration in F# (and .NET)
Enumeration in F# is modeled by the seq generic type. This is actually a type alias for IEnumerable<’T> in the System.Collections.Generic namespace. I will assume the namespaces for these interfaces and classes are already known by the reader in the following text, although they are referred to explicitly in the code later in this post. This interface provides a single method
GetEnumerator: unit -> IEnumerator<’T>
Through inheritance, IEnumerable<’T> also requires implementations to support the non-generic equivalent IEnumerable that provides an alternative:
GetEnumerator: unit -> IEnumerator
There are two important differences between IEnumerator<’T> and IEnumerator. The former provides typed data access and also inherits the IDisposable interface. The latter does not.
Enumeration is achieved by using the following method and property:
MoveNext: unit -> bool
Current: ‘T
An example nested data structure
The following data structure will be used to demonstrate the principles described in this post:
#light
type TKey1 = int list
type TKey2 = int
type TValue = string
// Do not want to open namespaces to avoid introducing unwanted CLR types into scope
type KVP<'K,'V> = System.Collections.Generic.KeyValuePair<'K,'V>
type IEnumerable = System.Collections.IEnumerable
type IEnumerable<'T> = System.Collections.Generic.IEnumerable<'T>
type IEnumerator = System.Collections.IEnumerator
type IEnumerator<'T> = System.Collections.Generic.IEnumerator<'T>
type CustomDataStructure =
{ model: Map<TKey1, Map<TKey2, TValue>> }
static member Empty with get() = { model = Map.Empty }
Next, an instance of this data structure can be constructed using the following code:
let m =
let data =
Map.Empty
|> Map.add [1;2] ( Map.Empty
|> Map.add 1 "value ([1;2]->1)"
|> Map.add 2 "value ([1;2]->2)"
)
|> Map.add [1;3] ( Map.Empty
|> Map.add 4 "value ([1;3]->4)"
|> Map.add 5 "value ([1;3]->5)"
)
|> Map.add [2] ( Map.Empty
|> Map.add 7 "value ([2]->7)"
|> Map.add 8 "value ([2]->8)"
)
{model = data}
Challenge: Enumerate over the contained elements only
The requirement is to provide an IEnumerable<TValue> implementation.
Approach 1: Implementing the interfaces
In F#, it is possibe to augment types (excluding type abbreviations) with interface declarations. The challenge with this first approach is how to create an appropriate IEnumerator<TValue> object. The state of the object must be mutable because it must selectively iterate from each of the contained Map instances. (An alternative would be to pre-create a new sequence containing all the elements of the desired sequence but the second approach below gives a better but similar solution).
The enumerator object can be constructed using an object expression, similar to an anonymous type expression in C#, as follows:
{ new IEnumerator<TValue> with
member e.Current with get() = failwith "invalid operation"
interface System.IDisposable with
member e.Dispose () = failwith "invalid operation"
interface IEnumerator with
member e.MoveNext () = failwith "invalid operation"
member e.Current with get() = failwith "invalid operation"
member e.Reset () = failwith "invalid operation"
}
Then, the method code just needs to be written :)
Wrapping mutable state is not easy in an anonymous type. One option is to create a named type and provide mutable state as part of that concrete enumerator type. The alternative is to use a mutable declaration. F# provides two such declaration mechanisms: let mutable and ref. If you try to use let mutable to solve this problem you will get the compiler error:
error FS0191: The mutable variable ‘x’ is used in an invalid way. Mutable variables may not be captured by closures. Consider eliminating this use of mutation or using a heap-allocated mutable reference cell via ‘ref’ and ‘!’
Clearly, using ref is necessary.
Here is an implementation using ref:
type CustomDataStructure =
{ model: Map<TKey1, Map<TKey2, TValue>> }
static member Empty with get() = { model = Map.Empty }
interface IEnumerable<TValue> with
member s.GetEnumerator () =
// outerEnum tracks the outer enumeration
let outerEnum =
(s.model
:>IEnumerable<KVP<TKey1, Map<TKey2,TValue>>>).GetEnumerator()
// innerEnum tracks the inner enumeration and is mutable
let innerEnum : ref<IEnumerator<KVP<TKey2,TValue>>> = ref null
{ new IEnumerator<TValue> with
member e.Current
with get() =
(!innerEnum).Current.Value
interface System.IDisposable with
member e.Dispose () =
if (innerEnum.Value <> null) then
(!innerEnum).Dispose()
innerEnum := null
outerEnum.Dispose()
interface IEnumerator with
member e.Reset () =
if (innerEnum.Value <> null) then
(!innerEnum).Dispose()
outerEnum.Reset()
member e.Current
with get() =
(((e:?>IEnumerator<TValue>).Current):>obj)
member e.MoveNext () =
let rec move() =
if (innerEnum.Value <> null) then
if ((!innerEnum).MoveNext() = true) then
true
else
(!innerEnum).Dispose()
innerEnum := null
findNext()
else
findNext()
and findNext() =
let r = outerEnum.MoveNext()
if r = true then
innerEnum := ((outerEnum.Current.Value):>IEnumerable<_>).GetEnumerator()
move ()
else
false
move()
}
interface IEnumerable with
member s.GetEnumerator () =
(s:>IEnumerable<TValue>).GetEnumerator()
:> IEnumerator
That is a lot of code! Worse, it has a very imperative style. However, it does work and demonstrates how to provide a custom implementation when the second approach (below) is not easily applied. You can use the following code to test the enumeration of the data structure:
m |> Seq.iter (fun x -> x |> System.Console.WriteLine)
Compare that to:
m.model |> Seq.iter (fun x -> x |> System.Console.WriteLine)
Approach 2: sequence comprehensions
This second approach uses an F# construct called a sequence comprehension. It is similar to implementing a custom enumerator in C# with the yield keyword.
First, a method that can convert the custom data type into a sequence (seq) is required. Then, interface implementations that provide this generated sequence allow the custom data type to be directly enumerated.
Here is the code for the sequence comprehension:
member s.to_seq() =
seq {
for map1pair in s.model do
for map2pair in map1pair.Value do
yield map2pair.Value
}
That is pretty straightforward and F# type-inference yields an expression of type seq<TValue> automatically.
Then, the implementation of the interface methods is:
interface IEnumerable<TValue> with
member s.GetEnumerator () =
s.to_seq().GetEnumerator()
interface IEnumerable with
member s.GetEnumerator () =
(s.to_seq().GetEnumerator())
:> IEnumerator
Note that in this approach there is no need to implement IEnumerator<TValue> and there is no need to manage resource disposal (using IDisposable) or mutable state. The full implementation using this approach is below:
type CustomDataStructure =
{ model: Map<TKey1, Map<TKey2, TValue>> }
static member Empty with get() = { model = Map.Empty }
member s.to_seq() =
seq {
for map1pair in s.model do
for map2pair in map1pair.Value do
yield map2pair.Value
}
interface IEnumerable<TValue> with
member s.GetEnumerator () =
s.to_seq().GetEnumerator()
interface IEnumerable with
member s.GetEnumerator () =
(s.to_seq().GetEnumerator())
:> IEnumerator
Conclusions
So, sequence comprehensions are the easiest way to produce custom enumeration over a data structure. If you find yourself needing to wrap mutable state and you do not want to use another type definition then ref can help you. Sometimes, though, you may find there is a better approach available.