## Wednesday, July 31, 2013

I have released Commutative, a monadic combinator library for writing commutative functions. This post explains the theory behind how it works.

#### Combinator-based, not Proof-based

Haskell's type system is quite precise and sophisticated, yet dependently-typed languages such as Agda and Idris manage to go even further. For example, using Agda, it is trivial to define the type of commutative functions: they are simply binary functions together with a proof that the function is commutative.

```    record Commutative (a b : Set) : Set where
field
f : a → a → b
prf : ∀ x y → f x y ≡ f y x
```

Haskell's type system is not powerful enough to represent proof objects, but even if it did, would you want to use them? Instead of writing proofs and programs in two separate steps, I much prefer Haskell's approach of using combinator libraries.

A simple example of a combinator library which avoids the need for proofs is bijections. The main idea is that since composing two bijections yields another bijection, programmers should be allowed to compose existing bijections without proof.

```    data Bijection a b = MkBij (a -> b) (b -> a)

instance Category Bijection where
id = Bijection id id
(MkBij g g') . (MkBij f f') = MkBij (g . f) (f' . g')
```

If the module exports a few bijective primitives but keeps the MkBij constructor private, users of the library will only be able to create Bijection instances through composition. Assuming the primitives were indeed proper bijections, this ensures that values of type Bijection are always guaranteed to be bijective. Without having to write any proof objects, even in the library!

#### Aspect-oriented Interlude

My passion for commutative (and associative) functions began while I was working on my master's thesis. I was working on aspect-oriented conflicts, and I soon discovered that the issue might not lie in the conflicting aspects themselves, but in the way in which those aspects were combined. The aspect-weaving process was not commutative!

Instead of keeping all the aspects in one global namespace and weaving them all together into one big program, I think aspects should be separated into independent "aquariums". One key aspect of my proposal is that there would be different types of aquariums, and only aspects of the same type should be combined with each other. Programmers may define custom aquariums, with one caveat: the aquarium must combine its aspects in a commutative and associative manner.

I knew that I couldn't just ask programmers to prove that all of their combination functions were commutative. Instead, I dedicated chapter 5 to a simple system which let programmers write "intrinsically" commutative functions, for which no further proof of commutativity would need to be written. That system eventually grew into today's combinator library.

#### Unordered pairs

My goal was to restrict the language in a way which ensured that only commutative functions could be written. The idea I came up with was to prevent functions from observing the order in which their arguments were given, by rewriting those two arguments into a unordered form. For example, if the arguments are both booleans, there are four possible (ordered) pairs of booleans, but only three unordered pairs.

```    data Unordered_Bool = TT | TF | FF

unorder_bool :: (Bool, Bool) -> Unordered_Bool
unorder_bool (True,  True ) = TT
unorder_bool (True,  False) = TF
unorder_bool (False, True ) = TF
unorder_bool (False, False) = FF
```

If a function is written in terms of Unordered_Bool instead of in terms of (Bool, Bool), then this function is commutative by construction, because the function has no way to distinguish the two orderings.

```    xor :: Unordered_Bool -> Bool
xor TF = True
xor _  = False
```

The main limitation of this strategy is that not all types can be unordered in this fashion. Algebraic types are okay, but what about function types? If you need to write a higher-order commutative function, that is, a function which takes a pair of functions as arguments, then you would need a way to represent an unordered pair of functions. How do we represent that?

At the time when I wrote my thesis, I could not answer this question. But now I can!

#### Unordered observations

A function is defined by its observations, so it would make sense for an unordered pair of functions to also be defined by its available observations.

With an ordinary (ordered) pair of functions, that is, an observation-based value approximating the type (a -> b, a -> b), it would make sense of have an observation of type a -> (b, b). That is, in order to observe a pair of functions, you need to provide an argument at which each of the two functions will be observed, and you receive the output of each function on that argument.

The way to represent an unordered pair of functions is now blindingly obvious: instead of returning an ordered pair revealing which output came from which function, we should return an unordered pair. That is, our unordered pair of functions should have an observation of type a -> Unordered b.

If a function is written in terms of (a -> Unordered b) instead of (a -> b, a -> b), then this function is commutative by construction, because the function has no way to distinguish the two orderings. For example, if we want to implement a commutative higher-order function which returns true if at least one of its two function arguments returns true on either 0 or 1, we can implement it as follows.

```    or01 :: (Int -> Unordered_Bool) -> Bool
or01 fg = fg 0 /= FF || fg 1 /= FF
```

I really like this solution. It is clean, simple, elegant... and wrong.

#### Distinguishable observations

Okay, maybe not wrong, but at least incomplete. The strategy fails if we try to implement, for example, a commutative higher-order function which returns true if at least one of its two function arguments returns true on both 0 and 1.

```    and01 :: (Int -> Unordered_Bool) -> Bool
and01 fg | fg 0 == FF ||
fg 1 == FF = False
and01 fg | fg 0 == TT ||
fg 1 == TT = True  -- Since the other is not FF.
and01 fg | fg 0 == TF ||      -- Are the two T from the
fg 1 == TF = ?     --  same function or not?
```

The problem with (a -> Unordered b) is that this representation is not just hiding the order of the original two arguments; it's hiding the order of every single observation. As the above example demonstrates, this is too strong.

The result TF indicates that a boolean observation has returned a different value for each argument. Once we have found such an observation, we ought to be able to use T and F as labels for the two arguments. We still won't know which was the first or second argument, but for each subsequent observation, we will know whether that observation came from the argument which resulted in T or from the one which resulted in F.

My commutativity monad is based on a single primitive, "distinguishBy", which does precisely what the previous paragraph describes. You ask it to perform a boolean observation (r -> Bool), which it performs on both arguments. If the answers are identical, you have failed to distinguish the arguments, but you still get to observe the Bool result. If the answers are different, hurray! You have successfully distinguished the two arguments, so you get the pair (r, r) corresponding to the original arguments.

The first r is always the argument for which the observation was False, while the second r is the one for which the observation was True. This way, you never learn the order in which the two r arguments were originally given, so the result is a Commutative operation from a pair of r to a value of type (Either Bool (r, r)).

```    distinguishBy :: (r -> Bool)
-> Commutative r (Either Bool (r, r))
```

There is also "distinguish", a slightly more general version which uses Either instead of Bool. From those two operations, plus trivial implementations for bind and return, all unordered pairs and all commutative operations can be reimplemented in a way which guarantees commutativity.

#### Completeness

Since the problem with (a -> Unordered b) was that the representation could not be used to implement all commutative functions, it would be wise to verify that Commutative doesn't suffer from the same flaw. Here is an informal argument proving completeness. I don't think it's completely airtight, but it's good enough for me.

##### proof:

Suppose f is a pure, computable, and commutative function of type r -> r -> a. We want to show that there exists a corresponding implementation of type Commutative r a. We modify the existing implementation as follows.

First, inline all the helper functions used by f, including builtins, and reimplement everything in monadic style. Then, rewrite everything again in CPS style so that we can abort the computation at any time. Also rewrite all pattern-matching to nested if statements, so that all decisions are performed on booleans, and use thunks to delay any observation of the two arguments until one of those boolean decisions. This ensures that all argument observations are boolean observations.

Since f is computable, f obtains its result after observing each of its arguments a finite number of times. Uniformly replace all those observations, regardless of the argument observed, by a call to distinguishBy. If the observations agree, continue with the observed value; it doesn't matter which argument was observed by f, since both arguments have the same observation. If the observations disagree, stop; we have managed to distinguish the two arguments x and y, so we can simply abort the computation and return f x y instead.

If it is the observation on x which returned True, the call to distinguishBy will give us (y, x) instead of (x, y), but by hypothesis, f y x returns the same result as f x y. If we never abort, we also return the same result as f, since all observations were the same as what f would have observed. ∎

#### Associativity

I am quite proud of my commutativity monad. I wish I could work on an associativity monad next, but none of the insights listed in this post seem to carry over. I am thus in need of new insights. Dear readers, any suggestions?

## Monday, July 01, 2013

### Comonads are neighbourhoods, not objects

Gabriel Gonzalez, the author of the well-known pipes package, has written a blog post in which he claims that comonads are objects. I think he is mistaken, but I can't blame him; I used to be quite confused about comonads myself.

To motivate the purported equivalence between objects and comonads, Gabriel gives detailed implementations of three classic object-oriented patterns (Builder, Iterator, and Command). Then, he reveals that all three implementations are comonadic. His examples are contrived, but more importantly, they are also misleading! Although not a direct quote from his post, he clearly intends the Haskell session

```>>> t1 = thermostat 3
>>> t2 = t1 # up'
>>> t3 = t2 # up'
>>> toString t3
"5 Kelvin"
```

to be equivalent to the following Java session.

```>>> t = new Thermostat(3);
>>> t.up();
>>> t.up();
>>> t.toString();
"5 Kelvin"
```

With this particular sequence of methods, the two sessions indeed compute the same result, but this is only because the up and down methods commute with each other. By adding the method square to the list, the behaviours of the two sessions quickly diverge. Strangely enough, in the Haskell version, each new method call gets applied before all the previous calls!

##### Haskell version (inverted method call order)
```>>> t1 = thermostat 3
>>> t2 = t1 # up'      -- 3+1
>>> t3 = t2 # up'      -- 3+1+1
>>> toString t3
"5 Kelvin"
>>> t4 = t3 # square'  -- 3*3+1+1
>>> toString t4
"11 Kelvin"
>>> t5 = t4 # up'      -- (3+1)*(3+1)+1+1
>>> toString t5
"18 Kelvin"
```
##### Java version (normal method call order)
```>>> t = new Thermostat(3);
>>> t.up();            // 3+1
>>> t.up();            // 3+1+1
>>> t.toString();
"5 Kelvin"
>>> t.square();        // (3+1+1)*(3+1+1)
>>> t.toString();
"25 Kelvin"
>>> t.up();            // (3+1+1)*(3+1+1)+1
>>> t.toString();
"26 Kelvin"
```

I hope this counter-example convinces you that comonads are not objects. But if that is true, what are comonads, then? What use do we have for methods which get applied in reverse chronological order?

The answer, I believe, is that a comonadic computation is similar to a cellular automaton. At each step, the computation for a cell computes its next value based on the value of its neighbours at the previous step. My interpretation of

```extract :: w a → a
```

is that w a is a neighbourhood containing a number of values of type a, at various locations around the current cell. A given comonad may provide functions to inspect, say, the neighbour immediately to the left of the current cell, or the neighbour from one timestep ago, but the bare minimum is that it must be possible to extract the value of the current cell itself.

Similarly, my interpretation of

```extend :: (w a → b) → w a → w b
```

is that w a → b computes one step of the cellular automaton by examining a local neighbourhood and producing the next value for the current cell. This single-cell computation is then extended to the entire neighbourhood, updating each cell as if it was the current one.

In our thermostat example, there is one cell for each possible temperature, and the value of each cell is also a temperature.

So far, I have stuck to Kelvin degrees in order to minimize the number of moving parts, but now that we have two completely different uses for temperatures, let's follow Gabriel by using Celsius for the cell contents, and Kelvin for the cell labels. Since temperatures are floating point values, our thermostat operates on a neighbourhood which is continuous instead of discrete. Continuousness is also the premise of the above Game of Life variant. Unlike comonadic computations, which apply their operations one at a time, the above cellular automaton also operates on a continuous time domain. I encourage you to watch the video, it looks awesome.

If our thermostat had an object-oriented implementation, each operation would simply modify the current cell by applying the operation to the Celsius temperature it contains. In fact, if this was an object-oriented implementation, we would only need one cell! Since this is, instead, a comonadic implementation, the operation is instead applied to the Kelvin temperature which labels the current cell. The resulting Kelvin temperature is interpreted as a pointer to another cell, and the final result is the Celsius contents of that cell.

When the available operations are restricted to just up and down, this scheme works just fine. Originally, each cell contains the Celsius equivalent of their Kelvin label. Then, each time the up method is applied, each cell takes on the Celsius value of the cell one unit above it, which effectively increases the temperature by one everywhere. But this only works because up and down preserve the invariant that cells which are one Kelvin unit apart contain values which are one Celsius unit apart!

One way to visualize this is to imagine a sheet of graph paper annotated with Celsius temperatures, over which we overlay a transparent sheet of graph paper annotated with Kelvin temperatures. One of the transparent cells is our current cell, and we can read its Celsius contents through the transparent paper. When we call up and down, we offset the two sheets of paper from each other, causing the apparent contents of all the cells to increase or decrease simultaneously.

#### Reverse chronological order

Similarly, if each cell contains its own temperature and we apply the square method, each cell squares its Kelvin temperature, looks up the corresponding cell, and ends up with the (Celsius representation of the) square of its original Kelvin value.

But if we apply up after the temperatures have already been squared, the values will change as if the values had been incremented before they were squared! How is that even possible? What is the secret of this reverse-chronological method application?

If there was only one cell, anti-chronological application would not be possible because the original, unsquared value x would been lost by the time the up operation was applied. Our comonadic implementation, however, has a lot of cells: in particular, one unit above the current cell, there is a cell whose unsquared value was x+1. Since the square operation affects all cells uniformly, that cell now contains the squared value (x+1)2. Therefore, it is no surprise that when the up operation offsets the two sheets of paper by one unit, the old x2 value scrolls out of view and gets replaced by (x+1)2 instead.

Ironically, Gabriel's original implementation of up' was applying its operation in the correct chronological order.

```>>> up' (t, f) = (t + 1, f)
```

But later on, while trying to shoehorn his object into the comonadic mold, he proved that the correct definition should have been as follows.

```>>> up' (t, f) = (t, \t' → f (t' + 1))
```

While that is indeed a correct comonadic implementation, this is not a correct object-oriented implementation because it applies its operation in reverse chronological order. In particular, notice that it modifies the input of f; since f applies all the other operations which have been accumulated so far, applying the new operation to its input has the effect of inserting that operation before all the others.

#### Visiting the neighbours

Why do comonads apply their operations in reverse chronological order? Is that ever useful?

In fact, this reversed order business is not the full story. In a typical comonad, the cell labels and their contents would not both be temperatures. Instead, if we were trying to implement a one-dimentional cellular automaton such as rule 30, we would label the cells with integers and their contents with colors. Wolfram's rule 30 automaton, from his controversial book A New Kind of Science.

Now that the input and output types are distinct, we can see that \t' → f (t' + 1) is not even attempting to modify the contents of the current cell, neither chrono- nor anti-chronologically. Instead, it is modifying the index of the cell on which f applies its computations, thereby allowing us to observe which color has been computed for a neighbouring cell. This is, of course, the first step to compute the color which the current cell will take on the next step.

Once we have wrapped the steps necessary to compute our next color into a function of type w Color → Color, we can extend this function to all the cells in our simulation, modifying the entire row at once by modifying f.

We have seen that comonads are not objects, but are instead neighbourhoods in which cellular automata can evolve. But that is only half of the question. If comonads are not objects, then what are objects? Is there another way to represent objects in Haskell?

There are many facets to modern object-oriented programming, but the aspect which I think Gabriel was aiming for in his post is objects as encapsulated data plus a bunch of methods to modify this data. To me, those features don't sound comonadic at all; rather, I would implement them using a State monad!

```>>> type Thermostat a = State Double a
>>>
>>> getKelvin  = get
>>> getCelsius = fmap (- 273.15) getKelvin
>>>
>>> toString = do c ← getCelsius
>>>               return (show c ++ " Celsius")
>>>
>>> up   = modify (+1)
>>> down = modify (-1)
```

Since we're using monads, the syntax for calling a few of those methods one after the other is just do-notation, which is even shorter than the this # notation advocated by Gabriel.

```>>> up3 :: Thermostat String
>>> up3 = do up
>>>          up
>>>          up
>>>          toString
```

Notice the type of the above code: Thermostat String doesn't mean that there are many different kinds of thermostats, and that this particular code is using or producing a "string thermostat". Rather, the above code is an object-oriented computation having access to a single Thermostat object and producing a single String value.

Okay, so using one monad, we managed to represent a computation manipulating one object. How would we manipulate two thermostats? Using two monads? Yes! Using more than one monad at once is precisely what monad transformers are about.

```>>> type ThermostatT m a = StateT Double m a
>>> obj1 = id
>>> obj2 = lift
>>>
>>> up_both :: ThermostatT Thermostat String
>>> up_both = do obj1 up
>>>              obj2 up
>>>              s1 ← obj1 toString
>>>              s2 ← obj2 toString
>>>              return (s1 ++ " and " ++ s1)
```

This time we have an object-oriented computation having access to two Thermostat objects, again producing a String value.

#### Duality

Since monads and comonads are duals, we would expect comonads to also have transformers. To figure out what comonad transformers are, let's spell out the similarities and differences of this duality.

First, monads are computations. Comonads are also computations, but of a different kind. The goal of both kinds of computations is to produce a value, but their means to obtain it are different. The main difference is that monads can cause side-effects, which are visible at the end of the computation, while comonads can observe neighbours, which need to be given before the computation can begin.

One of the characteristics of monadic-style computation is that there is a distinguished statement, return, which produces a given value without causing any side-effects. The equivalent characteristic for comonadic-style computation is that there is a distinguished observer, extract, which returns the current value without observing any neighbour.

In addition to this distinguished statement, which all monads share, each monad instance typically offers a number of monad-specific statements, each producing their own special side-effects. Correspondingly, each comonad instance typically provides a number of observers, each observing their own special kinds of neighbours. You can also observe the shape of the neighbourhood; for example, if your automaton runs on a finite grid, you can check whether the current cell lies on the boundary of the grid.

With monad transformers, it is possible to construct a combined monad in which the special statements of both component monads may be used. And now we have it: with comonad transformers, it must be possible to construct a combined comonad in which the special observers of both component comonads may be used.

For example, if we have a 1D row of cells, that is, a comonad in which left and right neighbours are available, and another 1D comonad in which up and down neighbours are available, then the combined comonad is the cartesian product of its components: a 2D grid in which all four direct neighbours are available.

If you are still curious about comonads, comonad transformers, and the above 2D grid trick, more details are available in my Haskell implementation of Conway's Game of Life.