Friday, January 09, 2009

well-foundedness, part 2 of 2: code is data. or is it codata?

Whether it is represented as finite strings or as finite trees, code is finite, and thus code is data, not codata. That should settle the question.

haskell code = 2 + 3 * 4 haskell string_data = "2 + 3 * 4" haskell data Exp = Lit Int | Add Exp Exp | Mul Exp Exp haskell tree_data = Add (Lit 2) (Mul (Lit 3) (Lit 4))

Except, of course, that's not really the question I wanted to ask. In my very first post, I warned that manipulating code as raw strings or trees was a bit primitive, and I've since been taught an alternative: higher-order abstract syntax. For the purpose of this post, that basically means I'm representing the code using a function instead of a data-structure.

With data-structures, proofs are usually done using structural induction (see part 1). But when using a function to represent code, what could be its internal structure? What are its substructures?

The clue lies in the exponential notation for function types. But first, remember the familiar multiplicative notation for, well, product types. If (a, b, c) is a value of type A × B × C, then b would be a substructure of type B. That seems reasonable. Now, consider a function λx. f x of type A → B. The exponential notation for this type is BA = B × B × ... × B, with one occurrence of B per inhabitant of A. If these inhabitants are (a0, a1, ...), then an alternate representation of our function could be (f a0, f a1, ...). This expanded form contains exactly the same information as the lambda notation, it's just a lot more verbose, especially when A has infinitely many inhabitants.

Seeing this explicit notation, an immediate conclusion is that for any value a of the appropriate type, the (value corresponding to the) application f a is a substructure of f. Wow! It was totally unexpected for me. That's why I'm writing about it. To share my excitement. But don't get too excited yet: there's a warning coming up.

I have mildly motivated our search for substructures in functions with the relatively exciting perspective of creating a structural induction proof whose decreasing structures include functions. Well, I hope you're not that motivated, because you can't do that. Function application isn't well-founded at all, at least in some languages. Take the identity function, for example. In Haskell, it is possible to apply the identity function to itself. The resulting substructure, obviously, will be the identity function. But that substructure also has the identity function as a substructure, and by continuing to recur in this fashion, we will never reach a base case. The proof will instead go on, forever, in an infinite regress.

Uh? I hope you're surprised! How could structural induction work with all of those other kinds of substructure relations, yet fail with this one? What's so special about functions?

Let's pause to ponder the presumed absurdity of an infinitely-nested structure. Take lists, for example. They are finite because after a finite number of elements, we reach the end of the list. An infinitely-nested list would go on forever, cons after cons, never reaching an end. While this would be absurd in most programming languages, infinite lists are actually commonplace in Haskell... So maybe functions aren't so unique after all!

Functions and infinite lists are examples of codata, and induction is indeed known not to work with those. Expressions of infinite depth do not have a terminating evaluation; the evaluator can't even read the entire input! I hear we're supposed to use "coinduction" instead, but I can't comment on that technique yet. What I can say, however, is that even though Haskell allows data and codata to mix, it's apparently a bad idea, and there are languages with type systems strong enough to prevent it. In these languages, there are two distinct types for lists, but I haven't seen a distinct type for "cofunctions" yet. I suspect that a function must be codata whenever some of its possible return values are.

Well, I bet you've heard enough about well-founded and non-well-founded structures (data and codata) for your entire life. But personally, I think I'm just getting started.

1 comment:

Martijn said...

Any recursive data type has infinite inhabitants—not just lists. You probably already know that, but your post seems to be biased towards lists.

Here's another data type which has an obvious infinite inhabitant:

data PascalSubtriangle = PascalSubtriangle
{ value :: Integer
, downLeft :: PascalSubtriangle
, downRight :: PascalSubtriangle