Sunday, December 21, 2014

The "99 Bottles of Beers" of Type Systems

"Hello World" is a good first example program because it is small, but also because it encourages the reader to get into the driver's seat and take control of the program. Copy-paste the "Hello World" listing from a website, and you're just blindly following instructions. Change it to print "Hello Mom", and you're boldly taking your first step towards the unknown, into a world where it is now you who is giving the instructions.

New programmers need to take that step, because programming anything non-trivial requires taking your own decisions about how things are implemented. If your boss was taking all the decisions for you, you wouldn't be a programmer, you'd be a typist.

The "Hello World" of Type Systems

Once you become an experienced programmer, "Hello World" examples are still useful as an introduction to new languages and new systems. Once you have a working base, you can experiment by making small changes and verifying whether they work as you expect, or if you need to read more tutorials.

For type systems, I guess a "Hello World" program would be a small datatype/structure/class containing a few simple fields. The standard here isn't as well-established as with "Hello World", but describing a person is quite common:

data Person = Person
  { name :: String
  , age :: Int
  }

I've used Haskell here, but regardless of the language in which I would have written this, you could still easily infer how to add a field for the person's height.

99 Bottles of Beer

A little down the road, another introductory program is "99 bottles of beer on a wall". This one teaches budding programmers another important lesson: it's possible to write a program which prints out more text than what you've written in its source code. More specifically, the program shows how to use a variable to abstract over the part of the text which varies from one iteration to the other, and how to use a loop to determine how many iterations to make and which value the variable should take in each one.

For type systems, a "99 bottles of beer" program would teach the same lesson: it's possible to write a program which uses larger types than those you've written in the source code. This is rarely needed, but it's possible! Even in a large, complicated application, you might have a manager of pools of worker threads processing lists of person values, but Manager (Pool (WorkerThread (List Person))) is still a fixed type which you write down explicitly in your program. It's as if you had abstracted out the number of beers to print, but then wrote explicit calls with n = 99, n = 98 and so on, instead of using a loop to generate the calls at runtime. Our "99 bottles of beer" example should generate types at runtime.

The "99 Bottles of Beer" of Type Systems

The simplest such example I could think of is as follows:

  1. Parse a non-negative integer n from standard input or from a command-line argument.
  2. If n is 0, print 42.
  3. Otherwise, print the pair (x,x), where x is the text which would have been printed if n was one unit smaller. For example, the output for n = 3 should be "(((42,42),(42,42)),((42,42),(42,42)))".

With the important restriction that the pair (x, x) must first be constructed before being printed, and its representation must not have the same type as x.

An incorrect solution

The reason the restriction is important is that otherwise, it would be possible to implement the program using a single type, that of integer trees:

-- *not* a valid solution
data Tree a = Leaf a | Branch (Tree a) (Tree a)

showTree :: Show a => Tree a -> String
showTree (Leaf x)       = show x
showTree (Branch t1 t2) = printf "(%s,%s)" (showTree t1)
                                           (showTree t2)

printTree :: Tree Int -> Int -> IO ()
printTree v 0 = putStrLn (showTree v)
printTree v n = printTree (Branch v v) (n-1)

main :: IO ()
main = readLn >>= printTree (Leaf 42)

That program does not demonstrate that it's possible to write a program which uses larger types than those you've written in the source code.

Haskell solution

Instead of using the same type Tree Int at every iteration, we want to construct a sequence of larger and larger types:

  1. Int
  2. (Int,Int)
  3. ((Int,Int),(Int,Int))
  4. (((Int,Int),(Int,Int)),((Int,Int),(Int,Int)))
  5. ...

In Haskell, this can be achieved via polymorphic recursion, meaning that we recur at a different type than the one which the current call is being instantiated at. For example, the call printTree 42 1 instantiates the type variable a = Int, while the recursive call printTree (42,42) 0 instantiates the type variable a = (Int,Int).

printTree :: Show a => a -> Int -> IO ()
printTree v 0 = print v
printTree v n = printTree (v,v) (n-1)

main :: IO ()
main = readLn >>= printTree 42

Polymorphic recursion is often used to recur on a smaller type, but since in this function it is the Int argument which is getting smaller, we can recur on a larger type without risking an infinite loop.

C++ solution

Speaking of infinite loops, C++ uses compile-time templates to handle polymorphic recursion, and this implementation strategy causes the compiler to instantiate more and more templates when we recur on a larger type. Eventually, gcc gives up with "template instantiation depth exceeds maximum of 900".

We can work around the problem by specializing the template at one of the types encountered before that limit, and printing an error message instead of recurring further.

#include <stdio.h>

template<typename A>
struct Pair {
  A fst;
  A snd;
  
  Pair(A fst, A snd)
  : fst(fst), snd(snd)
  {}
};

void print(int n) {
  printf("%d", n);
}

template<typename A>
void print(Pair<A> pair) {
  printf("(");
  print(pair.fst);
  printf(",");
  print(pair.snd);
  printf(")");
}

template<typename A>
void println(A value) {
  print(value);
  printf("\n");
}


template<typename A>
struct PrintTree {
  static void call(int depth, A value) {
    if (depth == 0) {
      println(value);
    } else {
      PrintTree<Pair<A> >::call(depth - 1, Pair<A>(value, value));
    }
  }
};

template<>
struct PrintTree<
  Pair<Pair<Pair<Pair<Pair<Pair<Pair<Pair<int>
> > > > > > > >
{
  static void call(int, Pair<
                          Pair<Pair<Pair<Pair<Pair<Pair<Pair<int>
                        > > > > > > >
  ) {
    fprintf(stderr, "maximum depth exceeded.\n");
  }
};

int main() {
  int depth;
  scanf("%d", &depth);
  
  PrintTree<int>::call(depth, 42);
  
  return 0;
}
Java solution

Other implementation strategies, such as Java's type erasure, need no such artificial bounds.

class Pair<A> {
  private A fst;
  private A snd;
  
  public Pair(A fst, A snd) {
    this.fst = fst;
    this.snd = snd;
  }
  
  public String toString() {
    return "(" + fst.toString() + "," + snd.toString() + ")";
  }
}

public class Main {
  public static <A> void printTree(int depth, A value) {
    if (depth == 0) {
      System.out.println(value.toString());
    } else {
      printTree(depth - 1, new Pair<A>(value, value));
    }
  }
  
  public static void main(String[] args) {
    Integer n = Integer.valueOf(args[0]);
    Integer m = 42;
    printTree(n, m);
  }
}
Conclusion

Many programming languages have the ability to work with larger types than those which are known at compile time, but for some reason, the feature is rarely used.

Perhaps one of the reasons is that the feature is rarely covered in tutorials. I have presented a small example demonstrating the feature, and I have demonstrated that the example isn't specific to one particular type system by implementing it in a few different languages. If you're writing a tutorial for a language and you have already covered "Hello World", "99 bottles of beer" and the "Hello World" of type systems, please consider also covering the "99 bottles of beer" of type systems.

Although, if I want this example to catch on, I should probably give it a better name. Maybe "Complete trees whose leaves are 42", or simply "Complete 42" for short?

Monday, December 08, 2014

How to package up binaries for distribution

This weekend, I wrote a game (in Haskell of course!) for Ludum Dare, an event in which you have 48h or 72h to create a game matching an imposed theme. It was really challenging!

Once the event was over, it was time to package my game in a form which others could play. Since the packaging procedure wasn't obvious, I'm documenting it here for future reference. The procedure isn't specific to Haskell, but I'll mention that linking Haskell programs statically, as advised around the web, didn't work for me on any platform.

Windows


While your program is running, use Process Explorer to list the .dll files your program is currently using (There is also Dependency Walker, but on my program it missed glut32.dll). Copy those DLLs to the same folder as your executable, zip the folder, and ship it.

OS X


Use otool -L to list the .dylib files on which your executable depends, and copy them to the same folder as your executable (or a libs subfolder). Use install_name_tool to change all the dylib paths embedded in your executable to @executable_path/foo.dylib (or @executable_path/libs/foo.dylib). Zip the folder, and ship it.

Linux


Use ldd to list the .so files on which your executable depends, and copy all of them except libc.so.X to the same folder as your executable (or a libs subfolder). Add ld-options: -Wl,-rpath -Wl,$ORIGIN (or ld-options: -Wl,-rpath -Wl,$ORIGIN/libs) to your cabal file, pass those flags directly to gcc, or use chrpath to change the existing RPATH if there is one. Zip the folder, and ship it.