Saturday, December 19, 2015

A whirlwind tour of Haskell, day 4

Day 4: process and conduit

For the next puzzle, we need to compute a sequence of md5 hashes and to return the first one which satisfies a given property. Haskell does have a few libraries implementing md5 hashing, but using such a function wouldn't teach us anything new about Haskell. Instead, let's delegate the task to an external program.

$ echo -n "abcdef609043" | md5sum

The md5sum program, which has nothing to do with Haskell, computes the md5 hash of its input. Let's run it from Haskell, using a command from the process library.

import System.Process

-- |
-- >>> md5 "abcdef609043"
-- ("abcdef609043","000001dbbfa3a5c83a2d506429c7b00e")
md5 :: String -> IO (String, String)
md5 input = do
    [output] <- lines <$> readProcess "md5sum" [] input
    return (input, output)

We will need to use this to compute the md5 hash of "abcdef1", "abcdef2", "abcdef3", and so on until we find an input string whose hash starts with "00000". Let's define the infinite list of all such input strings:

import Text.Printf

-- |
-- >>> take 3 ints
-- [1,2,3]
ints :: [Int]
ints = [1..]

-- |
-- >>> take 3 (inputs "abcdef")
-- ["abcdef1","abcdef2","abcdef3"]
inputs :: String -> [String]
inputs prefix = map (printf "%s%d" prefix) ints

I'd like to map our md5 function over this infinite list in order to obtain an infinite list of hashes, but Haskell thinks it's a mistake to do so:

-- Expected type: String -> (String,String)
--   Actual type: String -> IO (String,String)
-- In the first argument of ‘map’, namely ‘md5’
outputs :: [(String,String)]
outputs = map md5 inputs

The reason this is a mistake is because md5 performs I/O operations, as indicated by its output type IO (String,String). Exploiting Haskell's laziness to skip over some pure mathematical transformations is fine, but skipping over IO operations it a completely different thing. IO operations are usually executed for their side-effects, and so we probably don't want Haskell to optimize them away.

The fact that laziness doesn't work on IO computations is thus a good thing, but then how are we going to describe the rest of the algorithm? We could use an imperative approach, like we did in Day 1.5, by looping over the possible inputs and breaking out of the loop upon some condition. Or, we could learn something new by using a streaming library, in which we compose side-effecting computations along a stream in much the same way we've been composing mathematical functions with (>>>). Each piece decides when to perform side-effects, when to consume values from upstream, when to send some values downstream, and when to abort the entire streaming computation.

I've already explored the pipes streaming library in the past, so this time I will use the conduit library instead.

import Data.Maybe
import Data.Conduit
import qualified Data.Conduit.List as Conduit

-- |
-- >>> runConduit (day4 "abcdef")
-- ("abcdef3337","000a63ec2eecacd28b2a6592906fea34")
day4 :: String -> ConduitM input output IO (String, String)
day4 prefix = Conduit.sourceList (inputs prefix)
          =$= Conduit.mapM md5
          =$= Conduit.filter (snd >>> take 3 >>> (== "000"))
          =$= (fromJust <$> Conduit.head)

At the top of my stream, I use a source which will trickle each of the candidate input strings down the stream. Downstream, I execute our md5 computation on each input string as it flows down. Downstream from that, I filter the stream of results to only keep the ones which satisfy the criterion. The actual criterion is that the hash should begin with 5 zeroes, but with the overhead of spawning an external program for hundreds of thousands of candidate inputs, on my computer it takes almost 15 minutes to find a hash with 5 zeroes, so if you only want to check that the code runs and behaves as desired, save yourself some time and stop after 3 zeroes as in the code above. Despite the less restrictive predicate, very few values will pass this filter.

Once a value eventually makes it trough, Conduit.head aborts the computation using that value as the result. I added the fromJust bit because Conduit.head can either return Just x if x is the first value which makes it through the filter, or it can return Nothing if some other part of the stream aborts the computation before a value can reach Conduit.head. Since Conduit.head is the only piece in my stream which can abort the computation, I know that this will not happen, so I tell the compiler to assume that the output will have the form Just x for some x, and that the final result should be x.

Day 4.5 is omitted, because it is too similar to the version we just solved.


No comments: