Sunday, November 30, 2008

Non-standard inhabitants

The goal of this post is to demonstrate that I'm not crazy. One of my latest discoveries involves a function called "inhabitants", but that's impossible, so says the textbooks, and I can't explain my discovery any further because I crumble under the mathematically rigorous objections. You know, the way researchers whose latest discovery involves a perpetual motion engine tend not to be taken very seriously.

So I took the time to read about those mathematically rigorous objections, and I'm happy to report that my conclusion is that I am not a crackpot. It's just a misunderstanding. A bit as if my perpetual motion engine was supposed to cycle more slowly at every turn, without ever stopping (hence the name) and without producing useful work. It may or may not be an interesting discovery, but it has nothing to do with what people usually call perpetual motion engines.

Now that I'm convinced of my own sanity, the hard part is to convince others. I'll practice with you, foreign and unknown readers, who are probably too far away to cause me trouble if I fail.


First of, I'd like to explain what people usually mean by "inhabitants". The type inhabitation problem is the task of determining, for any given type in a particular programming language, whether there exists an expression in that language which has that type. Such an expression is said to "inhabit" its type. It would be even more useful if an actual inhabitant could be returned as a witness, and that's precisely what Djinn does for Haskell's type system.

Some people have wondered whether Djinn could be extended to enumerate all the inhabitants of that type, instead of just one. Well, no. And here is a diagonalization argument to prove it.


First, it makes things much simpler if we cast everything in terms of integers. Types and expressions can both easily be encoded as integers, for example by interpreting their ascii representations as base 256 numbers. I'll use the notation ⟨x⟩ to denote the integer representing the type or expression x.

Even though our expressions now masquerade as integers, we should still be able to evaluate them.

haskell eval :: Int -> Int -> Int spec ⟨f⟩ `eval` ⟨x⟩ == ⟨f x⟩

The above code is the specification for eval, not its implementation. It is an incomplete specification, because we don't care about the behaviour of eval when it is given integers which don't represent expressions (rendering the left-hand side meaningless) or when these expressions are of incompatible types (rendering the right-hand side meaningless). In what follows, eval should always be used implicitly whenever the code appears to be applying an integer to another.

Next, by contradiction, suppose we did have a version of Djinn which enumerated all of the inhabitants of a type.

haskell inhabitants :: Int -> Int -> Int spec if (p :: T) then ∃i:Int. inhabitants ⟨T⟩ i == ⟨p⟩

Instead of returning a list of all the results, inhabitants takes an extra integer i as a second argument and returns the ith member of its enumeration. If there are less than i members in the enumeration, then we don't care what inhabitants returns.

Again, that's because the specification is incomplete. It is very incomplete. It doesn't even forbid inhabitants from returning expressions with the wrong type in addition to those who do. It only requires inhabitants not to miss any of the expressions which do have the requested type, because that's all I need in order to derive a contradiction. In fact, I only need to find a single type and a single expression of that type which inhabitants misses. The diagonal, for example.

haskell diag :: Int -> Int haskell diag i = 1 + inhabitants ⟨Int -> Int⟩ i i

This time I did not merely give a specification, but an explicit definition.

The function diag has type Int -> Int, so the specification of inhabitants requires that inhabitants Int -> Int i == diag⟩ for some i. Both sides of this equation should evaluate to the same integer, ⟨diag⟩. And thus, the expressions represented by those identical integers should also be the same. In particular, the expressions should behave the same on all inputs, including input i. But on that input, diag made sure to behave differently, by returning a value that is off by one from the left-hand side's value! Thus, inhabitants Int -> Int⟩ must have missed diag, as needed.


Now that we know that implementing inhabitants correctly is impossible, what's wrong with the following attempt?

haskell inhabitants :: Int -> Int -> Int haskell inhabitants _ p = p

Okay, so it's not a very serious attempt, but it does satisfy the specification. Remember how I said that I didn't need to forbid inhabitants from returning too many expressions? Well here I am, returning all expressions, all the time, regardless of the type requested. How could I miss any?

Let's walk through the argument again. There is some diag function which I'm supposed to miss. The specification requires inhabitants Int -> Int i == diag⟩ for some i. And with this implementation, this happens when i is equal to ⟨diag⟩. At this point, diag is supposed to stab us in the back by adding one!

haskell diag i = 1 + inhabitants ⟨Int -> Int⟩ i i haskell diag ⟨diag⟩ = 1 + inhabitants ⟨Int -> Int⟩ ⟨diag⟩ ⟨diag⟩ haskell diag ⟨diag⟩ = 1 + ⟨diag⟩ `eval` ⟨diag⟩ haskell diag i = 1 + diag i haskell diag i = loop

The foolish diag, however, was stabbed by our infiltrated agents first. It turns out that calling inhabitants Int -> Int i caused diag i not to terminate! Adding one to ⊥ has no effect, and diag failed to distinguish its behaviour from inhabitants Int -> Int i, avoiding a contradiction.

What this tells us, is that the original argument implicitly assumed that inhabitants only enumerated terminating expressions. In turn, this tells us that the type inhabitation problem implicitly requires inhabitants to terminate. And with hindsight, it's rather obvious than they should. Otherwise, the type inhabitation problem would be trivial: every single type would be inhabited!

haskell fake_inhabitant :: a haskell fake_inhabitant = haskell let witness = fake_inhabitant haskell in witness

And now, I can finally reveal what it is that makes my perpetual motion engine possible: the inhabitants function involved in my discovery enumerates expressions which, syntactically, have the correct type, but whether the enumerated expressions terminate or not is irrelevant to my discovery. The fact that I won't be able to produce useful work out of it doesn't mean it can't also be a nice theoretical device. But maybe I really should have called it a "perpetually slowing-down engine", to avoid confusion.


update: In other news, it is also impossible for Djinn to exist. But it's only impossible for versions of Djinn which cover strong enough type systems, like Haskell's. From this, I can conclude that Djinn covers a useful fragment of Haskell, and you can conclude that I don't use Djinn very often.

4 comments:

Anonymous said...

What's really going on is Haskell's type system is inconsistent because Bottom inhabits all the types. See David Turner's paper "total functional programming" which you can find on google easily. It is pretty readable.

gelisam said...

If I may summarize my understanding of the paper for the other visitors, Turner is basically saying that like most code-related proofs, my diagonalization argument only works for total languages.

In order to cover a partial language like Haskell, I would need to add extra cases in order to cover ⊥. And the way to cover those cases is by requiring the purported "inhabitants" implementation to enumerate terminating expressions only.

What's really, really going on is that "inhabitants" is only hard if all inhabitants are required to terminate, like I said. Obviously, in total languages, they are.

gelisam said...

That being said, thanks for the reference! The first part is indeed quite readable, but the part on codata wasn't. That's unfortunate, because I really like dual things.

For the interested reader, sigfpe wrote a nice simplification of that part of the paper. Go read it now!

gelisam said...

The most impressive implementation of "inhabitants" I have seen is miniKanren, a term-searching library which does a lot more than listing all terms of a given type. Their demo lists lambda-calculus terms, but can be easily adapted to search terms in whichever language you implement in it.