The fold functions in functional languages are very powerful constructs, but how about when you want a different function applied to the first element, or to the last? What about when the sequence is empty? Rather than solving this problem repeatedly, I have written a few functions to cover the range of cases. As I wrote these, I found it interesting that the solutions for Array, Seq and List all look very different.
Here is a simple scenario. I have a list of items that I want to convert to strings, and I also want to put a separator between them. Importantly, a trailing separator must be avoided. (Note: this is a very simplistic case of a wider range of problems). To do this with fold means writing extra functionality to track whether a separator is needed. In general, you need something like a continuation function. To provide generic functionality, I believe you would end up writing something like I have yourself.
Some library functions
To support the implementation of the generic fold (as I have called it), gfold, I have the following library functions:
let abort<'T,'U> (_:'T) : 'U = failwith "ABORT!" let combineLeft2 fx fy a x y = fy (fx a x) y let combineRight2 fx fy a x y = fy y (fx x a)
Note: I do not use combineRight2 but I have included it for completeness.
abort is a function that can be used in the place of any function where the execution is always a logic error. I could just use failwith but the advantages of abort are that it is a function that can be used directly and that it makes it clear that this is not a runtime error caused by an invalid operation but that it is a logic error in the program itself. I use abort to handle pattern matches that are not relevant and cannot occur.
combineLeft2 combines two functions that each take two parameters and where the first input to the second function is the result of applying the first function. This may seem obscure, but the standard fold functions take a function that takes a state value as the initial parameter and then an enumerated item as a second parameter. Applying this type of function to two items in a list, [x1;x2], is as follows:
f (f s x1) x2
However, if the two functions are different then:
f2 (f1 s x1) x2
Which is equivalent to:
let f f1 f2 s x1 x2 = f2 (f1 s x1) x2) in f s x1 x2
Or more simply:
(combineLeft2 f1 f2) s x1 x2
And so combineLeft2 f1 f2 results in a single function value that takes three curried arguments.
The basic goal behind gfold is to allow a different function to be applied to the first, middle and last elements. However, this raises the question of what to do with only one or two items in the sequence. Also, it is usual for fold to return its initial state on an empty sequence, but unless you track the count of items there is no way of detecting whether the sequence is empty using fold itself (although each type of sequence has its own functions for doing this). So, my generic fold has functions to support each of these boundary cases:
let gfold fZero fOne fTwo fFirst fMiddle fLast s sequence
The parameters have the following types:
- fZero : ‘a -> ‘b
- fOne : ‘a -> ‘c -> ‘b
- fTwo : ‘a -> ‘c -> ‘c -> ‘b
- fFirst : ‘a -> ‘c -> ‘d
- fMiddle : ‘d -> ‘c -> ‘d
- fLast : ‘d -> ‘c -> ‘b
- s : ‘a
- sequence : ‘c seq or ‘c list or ‘c array
- result : ‘b
The implementation of generic fold on a list is the most intuitive:
let gfold fZero fOne fTwo fFirst fMiddle fLast s list = match list with |  -> fZero s | [x] -> fOne s x | [x;y] -> fTwo s x y | x::xs -> let rec inner s list = match list with | [x] -> fLast s x | x::xs -> inner (fMiddle s x) xs |  -> abort() inner (fFirst s x) xs
I hope that is fairly easy to read.
Using this implementation of gfold, it is easy to write functions that act differently on the first or last item of a list:
let gfoldFirst fZero fFirst fAny s list = gfold fZero fFirst (combineLeft2 fFirst fAny) fFirst fAny fAny s list let gfoldLast fZero fAny fLast s list = gfold fZero fLast (combineLeft2 fAny fLast) fAny fAny fLast s list
If you know that the list has at least one item then you can replace fZero with abort.
let gfold fZero fOne fTwo fFirst fMiddle fLast s (arr:array<_>) = let arrn = arr.Length if arrn = 0 then fZero s elif arrn = 1 then fOne s arr. elif arrn = 2 then fTwo s arr. arr. else let mutable res = fFirst s arr. for i = 1 to (arrn-2) do res <- fMiddle res arr.[i] fLast res arr.[arrn-1]
The code for arrays is less functional in style (it does not use pattern matching and it uses mutable storage), but that is because indexed access on an array is more efficient. The implementation for gfoldFirst and gfoldLast is identical to the implementation for lists.
let gfold fZero fOne fTwo fFirst fMiddle fLast s (sequence:_ seq) = use e = sequence.GetEnumerator() if e.MoveNext() then let x1 = e.Current if e.MoveNext() then let x2 = e.Current if e.MoveNext() then let mutable res = fMiddle (fFirst s x1) x2 let mutable x = e.Current while e.MoveNext() do res <- fMiddle res x x <- e.Current fLast res x else fTwo s x1 x2 else fOne s x1 else fZero s
This code for sequences is not particularly nice. However, the alternative of using fold with a continuation function and saved values that I attempted convinced me that this was a better approach. If someone has a more elegant solution I would be interested in seeing it. Nevertheless, this implementation is efficient. As with arrays, the implementation of gfoldFirst and gfoldLast is identical to the implementation for lists.
Using the scenario of inserting separators between strings, here is a simple example representing the summation of numbers:
printf "%s\n\n" (gfoldLast (fun (t,s) -> t + " No calculation: " + (string s)) (fun (t,s) -> fun x -> t + " " + (string x) + " + [running total: " + (string (s+x)) + "]\n" , s + x) (fun (t,s) -> fun x -> t + " " + (string x) + " = " + (string (s+x))) ("", 0) [1 ; 2 ; 3 ; 4])
The output from this is as follows:
> 1 + [running total: 1] 2 + [running total: 3] 3 + [running total: 6] 4 = 10 val it : unit = ()
Note: this example is only intended to illustrate a use of gfoldLast
Whether to use gfold or another construct needs a certain amount of judgement as the resulting code may be more complicated. In this case, there are two intermediary state values (the text and the running total) but only a single final result (the text). Using gfold directly would allow the initial running total to be hidden but would have required more code because of the extra function arguments. This code would also be difficult using the standard fold function because of the need to handle the last case differently. This could be managed using a complex state variable but I think the solution using gFoldLast is the easiest to understand.
To wrap the code into a standard library in my code, I have extended the standard modules in Microsoft.FSharp.Collections. This makes the generic fold implementations available in the same way as the standard fold implementations.
As always, comments are welcome.