F#: Generic Fold

The fold functions in functional languages are very powerful constructs, but how about when you want a different function applied to the first element, or to the last? What about when the sequence is empty? Rather than solving this problem repeatedly, I have written a few functions to cover the range of cases. As I wrote these, I found it interesting that the solutions for Array, Seq and List all look very different.

Here is a simple scenario. I have a list of items that I want to convert to strings, and I also want to put a separator between them. Importantly, a trailing separator must be avoided. (Note: this is a very simplistic case of a wider range of problems). To do this with fold means writing extra functionality to track whether a separator is needed. In general, you need something like a continuation function. To provide generic functionality, I believe you would end up writing something like I have yourself.

Some library functions

To support the implementation of the generic fold (as I have called it), gfold, I have the following library functions:

let abort<'T,'U> (_:'T) : 'U =
    failwith "ABORT!"

let combineLeft2 fx fy a x y = 
    fy (fx a x) y

let combineRight2 fx fy a x y = 
    fy y (fx x a)

Note: I do not use combineRight2 but I have included it for completeness.

abort is a function that can be used in the place of any function where the execution is always a logic error. I could just use failwith but the advantages of abort are that it is a function that can be used directly and that it makes it clear that this is not a runtime error caused by an invalid operation but that it is a logic error in the program itself. I use abort to handle pattern matches that are not relevant and cannot occur.

combineLeft2 combines two functions that each take two parameters and where the first input to the second function is the result of applying the first function. This may seem obscure, but the standard fold functions take a function that takes a state value as the initial parameter and then an enumerated item as a second parameter. Applying this type of function to two items in a list, [x1;x2], is as follows:

f (f s x1) x2

However, if the two functions are different then:

f2 (f1 s x1) x2

Which is equivalent to:

let f f1 f2 s x1 x2 = f2 (f1 s x1) x2) in f s x1 x2

Or more simply:

(combineLeft2 f1 f2) s x1 x2

And so combineLeft2 f1 f2 results in a single function value that takes three curried arguments.

Generic fold

The basic goal behind gfold is to allow a different function to be applied to the first, middle and last elements. However, this raises the question of what to do with only one or two items in the sequence. Also, it is usual for fold to return its initial state on an empty sequence, but unless you track the count of items there is no way of detecting whether the sequence is empty using fold itself (although each type of sequence has its own functions for doing this). So, my generic fold has functions to support each of these boundary cases:

let gfold fZero fOne fTwo fFirst fMiddle fLast s sequence

The parameters have the following types:

  • fZero : ‘a -> ‘b
  • fOne : ‘a -> ‘c -> ‘b
  • fTwo : ‘a -> ‘c -> ‘c -> ‘b
  • fFirst : ‘a -> ‘c -> ‘d
  • fMiddle : ‘d -> ‘c -> ‘d
  • fLast : ‘d -> ‘c -> ‘b
  • s : ‘a
  • sequence : ‘c seq or ‘c list or ‘c array
  • result : ‘b

List implementation

The implementation of generic fold on a list is the most intuitive:

let gfold fZero fOne fTwo fFirst fMiddle fLast s list = 
    match list with
    | [] -> fZero s
    | [x] -> fOne s x
    | [x;y] -> fTwo s x y
    | x::xs ->
        let rec inner s list =
            match list with
            | [x] -> fLast s x
            | x::xs -> inner (fMiddle s x) xs
            | [] -> abort()
        inner (fFirst s x) xs

I hope that is fairly easy to read.

Using this implementation of gfold, it is easy to write functions that act differently on the first or last item of a list:

let gfoldFirst fZero fFirst fAny s list =
    gfold fZero fFirst (combineLeft2 fFirst fAny) 
          fFirst fAny fAny s list

let gfoldLast fZero fAny fLast s list =
    gfold fZero fLast (combineLeft2 fAny fLast)
          fAny fAny fLast s list

If you know that the list has at least one item then you can replace fZero with abort.

Array implementation

let gfold fZero fOne fTwo 
        fFirst fMiddle fLast s (arr:array<_>) = 
    let arrn = arr.Length
    if arrn = 0 then fZero s
    elif arrn = 1 then fOne s arr.[0]
    elif arrn = 2 then fTwo s arr.[0] arr.[1]
    else
        let mutable res = fFirst s arr.[0]
        for i = 1 to (arrn-2) do
            res <- fMiddle res arr.&#91;i&#93;
        fLast res arr.&#91;arrn-1&#93;&#91;/sourcecode&#93;

The code for arrays is less functional in style (it does not use pattern matching and it uses mutable storage), but that is because indexed access on an array is more efficient.  The implementation for <em>gfoldFirst</em> and <em>gfoldLast</em> is identical to the implementation for lists.

<strong>Sequence implementation</strong>

let gfold fZero fOne fTwo
        fFirst fMiddle fLast s (sequence:_ seq) = 
    use e = sequence.GetEnumerator() 
    if e.MoveNext() then
        let x1 = e.Current
        if e.MoveNext() then
            let x2 = e.Current
            if e.MoveNext() then
                let mutable res = 
                    fMiddle (fFirst s x1) x2
                let mutable x = e.Current
                while e.MoveNext() do
                    res <- fMiddle res x
                    x <- e.Current
                fLast res x
            else
                fTwo s x1 x2
        else
            fOne s x1
    else
        fZero s&#91;/sourcecode&#93;

This code for sequences is not particularly nice.  However, the alternative of using <em>fold</em> with a continuation function and saved values that I attempted convinced me that this was a better approach.  If someone has a more elegant solution I would be interested in seeing it.  Nevertheless, this implementation is efficient.  As with arrays, the implementation of <em>gfoldFirst</em> and <em>gfoldLast</em> is identical to the implementation for lists.

<strong>An example</strong>

Using the scenario of inserting separators between strings, here is a simple example representing the summation of numbers:

printf "%s\n\n"
    (gfoldLast 
        (fun (t,s) -> 
              t + "  No calculation: " + (string s)) 
        (fun (t,s) -> fun x -> 
              t 
            + "  " + (string x)
            + " + [running total: " 
            + (string (s+x)) + "]\n" , 
              s + x)
        (fun (t,s) -> fun x -> 
              t 
            + "  " + (string x) 
            + " = " + (string (s+x)))
        ("", 0)
        [1 ; 2 ; 3 ; 4])

The output from this is as follows:

> 
  1 + [running total: 1]
  2 + [running total: 3]
  3 + [running total: 6]
  4 = 10

val it : unit = ()

Note: this example is only intended to illustrate a use of gfoldLast

Whether to use gfold or another construct needs a certain amount of judgement as the resulting code may be more complicated. In this case, there are two intermediary state values (the text and the running total) but only a single final result (the text). Using gfold directly would allow the initial running total to be hidden but would have required more code because of the extra function arguments. This code would also be difficult using the standard fold function because of the need to handle the last case differently. This could be managed using a complex state variable but I think the solution using gFoldLast is the easiest to understand.

Final comments

To wrap the code into a standard library in my code, I have extended the standard modules in Microsoft.FSharp.Collections. This makes the generic fold implementations available in the same way as the standard fold implementations.

As always, comments are welcome.

Advertisements

3 thoughts on “F#: Generic Fold

  1. James Moore

    What about an initial transformation into a sequence of Positions (Beginning,Middle,End,Only) like this:

    type Position =
    | Beginning of ‘t
    | Middle of ‘t
    | End of ‘t
    | Only of ‘t

    let rec sequenceOfPositionsWithLazyList(existing: LazyList) =
    seq {
    let rec elementsAfterBeginning(s) =
    seq {
    match s with
    | LazyList.Cons(h, LazyList.Nil) -> yield End(h); ()
    | LazyList.Cons(h, t) -> yield Middle(h); yield! elementsAfterBeginning(t)
    | _ -> ()
    }
    match existing with
    | LazyList.Cons(h, LazyList.Nil) -> yield Only(h); ()
    | LazyList.Cons(h, t) -> yield Beginning(h); yield! elementsAfterBeginning(t)
    | LazyList.Nil -> ()
    }

    let sequenceOfPositions(existing: seq) = sequenceOfPositionsWithLazyList(LazyList.of_seq existing)

    printfn “%A” (sequenceOfPositions [1;2;3])
    printfn “%A” (sequenceOfPositions [1;3])
    printfn “%A” (sequenceOfPositions [1])
    printfn “%A” (sequenceOfPositions [])

    That gets you:

    seq [Beginning 1; Middle 2; End 3]
    seq [Beginning 1; End 3]
    seq [Only 1]
    seq []

    And I think that makes all of your fold functions much easier to write.

    Reply
    1. Steve Horsfield Post author

      Thanks James.

      I do like your approach. I wasn’t trying to make a single generic approach to the implementation as I was hoping to demonstrate the differences between the different containers (seq, list, array).

      So long as you then write “gfold” using the transformed sequence involving Position then all is good. If you want to use fold directly on a Position sequence then you will need to use a match function which means that you are rewriting the same code for each case, each time you use it.

      Finally, if you are using Position directly then both Only and End are redundant cases. “Only” is “Start Cons Null” and End is “Middle Cons Null”, similarly (but missing from your example) “Two” is “Start Cons Middle Cons Null”.

      Thanks for your comment, I do like your approach of using sequence comprehensions.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s