Saturday, August 16, 2014

Homemade FRP: mystery leak

The mystery leak is back, and this time it's personal: the bug is clearly in my code somewhere.

My first hypothesis is that since I am emulating reactive-banana's API, and I now have the same kind of leak as I had with the real reactive-banana, maybe it's the same bug with the same solution? Nope, adding a <* stepper undefined events for each kind of event does not plug the leak.

My next attempt is to simplify: what is the simplest program I can write which reproduces the leak?

main :: IO ()
main = play (InWindow "Nice Window" (200, 200) (800, 200))
            white
            30
            (pure (Circle 10))
            currentValue
            (\_ -> handleEventB ())
            (\_ -> handleEventB ())
Obtained using the "-hy" flag; see Real World Haskell's chapter on profiling.
The rest of the profile information is useless: it says all the space is used by "MAIN".

That's as simple as main could get while still using gloss and my homemade FRP implementation. I already know that a pure (Circle 10) behaviour doesn't leak with gloss and the real reactive-banana, so clearly, the bug must be in my homemade version.

In order to simplify further, let's bring in the FRP primitives.

data Behavior t a = Behavior
  { currentValue :: a
  , runBehavior :: t -> Behavior t a
  }

pure :: a -> Behavior t a
pure x = Behavior x (\_ -> pure x)

handleEventB :: t -> Behavior t a -> Behavior t a
handleEventB = flip runBehavior

main :: IO ()
main = play (InWindow "Nice Window" (200, 200) (800, 200))
            white
            30
            (pure (Circle 10))
            currentValue
            (\_ -> handleEventB ())
            (\_ -> handleEventB ())
Still leaks; as expected, since I didn't really change anything yet.

I'm not using Behavior's ability to accept input events, so I can simplify further:

data Behavior t a = Behavior
  { currentValue :: a
  , runBehavior :: Behavior t a
  }

pure :: a -> Behavior t a
pure x = Behavior x (pure x)

main :: IO ()
main = play (InWindow "Nice Window" (200, 200) (800, 200))
            white
            30
            (pure (Circle 10))
            currentValue
            (\_ -> runBehavior)
            (\_ -> runBehavior)
I don't know why the colours have changed, but clearly, it's still leaking.
(spoiler alert: the colour change will turn out to be important)

Usually, space leaks are due to insufficient strictness. I waste a bit of time adding bangs and seqs here and there, but of course, the application is already strict: in order to display each Picture, gloss is forced to fully-evaluate each step.

Still, just to validate that gloss really is doing the simple iterative loop I think it is doing, I replace play with my own loop.

main :: IO ()
main = loop (pure (Circle 10))
  where
    loop b = do
        print (currentValue b)
        loop (runBehavior b)

Compared with the previous graph, it looks like the memory is now high all the time,
but look at the Y axis: the memory usage is now constant, and much smaller than before.

Interesting, the leak is gone! Could the leak be in gloss after all? To make absolutely sure, I go back to my full, unsimplified program, and I replace its top-level by a similar loop.

main :: IO ()
main = loop (reactiveMain floats inputs)
  where
    floats = inputEvents
    inputs = never
    
    loop b = do
        print (currentValue b)
        loop (handleEventB 1.0 b)

This time, the colour change is expected, because the program is completely different.
But did you expect the leak to be back?

Nope, it's not gloss. Or at least, it's not just gloss: I observe a leak when I use a gloss loop with a simple update function (G and ¬U), I also observe a leak when use a simple loop with a complex update function (¬G and U), but I don't see a leak when I use the simple loop with the simple update function (¬G and ¬U). Therefore, there is probably a leak in gloss (G), and a second leak in the complex update function (U). That's why the graph colours changed: the different programs are illustrating different leaks.

I begin with the complex update function; since I wrote all the code, I should be able to understand the problem better.

The update function leak
The profile data is very different this time, and also very useful. It's no longer Behavior which uses up all the space: it's lists, and the profiling report (obtained with "-p") even tells us where to look:

COST CENTRE     MODULE    %time %alloc

main.loop       Main       78.0   75.8
accumE.go.(...) PureFrp     5.9    8.2
filterE.go      PureFrp     2.1    4.3
fmap            PureFrp     2.0    2.6
accumE.go       PureFrp     1.5    2.7
mappend         PureFrp     1.3    1.3
accumE.go.xs    PureFrp     1.1    1.5

accumE is indeed manipulating lists: it uses scanl to produce a list of intermediate values, obtained by accumulating all the events occurring this frame. Let's add some strictness to that list:

accumE :: a -> Event t (a -> a) -> Event t a
accumE x e = Event go
  where
    go t = x' `seq` (xs', accumE x' e')
      where
        (fs, e') = runEvent e t
        xs = scanl (flip ($)) x fs
        x' = last xs
        xs' = tail xs  -- skip the initial unmodified x

Leak fixed. Next!

One leak down, one to go. Before we move on, though, I'd like to understand why strictness is needed in this particular function.

Let's see, each time a Float event comes in, reactiveMain is called and its current value is fully evaluated. Somewhere inside reactiveMain, there is a call to accumE, and without the extra strictness annotation, we know that accumeE's current value (the xs list and the x' it contains) doesn't get evaluated. Why not?

There is more than one use of accumE in reactiveMain, but the one in firstEvent looks particularly fishy:

firstEvent :: Event t a -> Event t a
firstEvent = fmap snd
           . filterE ((== 1) . fst)
           . numberEvents

numberEvents uses accumE to pair each event with an incrementing number. But for all events except the first one, we only evaluate the numeric half of the pair; could it be that the right-hand side is accumulating thunks?

Given the fact that a single seq was sufficient to solve the problem, that seems unlikely. After all, seq only forces x' to WHNF; depending on the implementation of numberEvents, that might either mean evaluating the outer pair or the incrementing number, but definitely not the inner members of the pair.

And indeed, if we revert the seq fix and remove firstEvent from the equation, the leak returns:
firstEvent is not the problem here.

There is another occurrence of numberEvents inside clickEvents. This location is interesting because I have noticed that clicking on a button causes a lot of extraneous memory to be released.

clickEvents :: Event t (Char, Int)
clickEvents = fmap swap
            $ numberEvents
            $ clickLabels

To explain the memory release, Float and click events must somehow be evaluated differently: one accumulates thunks, while the other evaluates them. And there is indeed a big difference: clickEvents ignores all the Float events, and only reacts to clicks. So how is ignoring events leading to thunk accumulation?

Well, ignoring events simply means filtering them out of the list of input events, which means that most of the time, clickEvents is going to be receiving an empty list as input and producing an empty list as output. This empty list gets fully-evaluated on every frame, in order to confirm that there is nothing to update on the screen, but that doesn't mean that all the thunks get forced: x', for example, is not a member of this empty list.

Let's look at the code for accumE again:
accumE :: a -> Event t (a -> a) -> Event t a
accumE x e = Event go
  where
    go t = (xs', accumE x' e')
      where
        (fs, e') = runEvent e t
        xs = scanl (flip ($)) x fs
        x' = last xs 
        xs' = tail xs  -- skip the initial unmodified x

scanl always returns a non-empty list starting with x, which is why we can call partial functions like last and tail without fear. Also, since x is passed through without being examined, any thunks accumulated to compute x will be passed through without being evaluated. In this case, it's the calls to last which will accumulate.

The reason the memory gets released when a button is clicked is that x is also involved in the computation of the elements of the output list, so as soon as this output list becomes non-empty and its elements get fully-evaluated, so does x. With the new strictness annotation, just evaluating the output list to WHNF is now sufficient to evaluate the next x to WHNF, thereby fixing the leak.

The gloss leak
Let's look into the second leak now, the one which occurs when I use gloss with a simple update function.

data Behavior t a = Behavior
  { currentValue :: a
  , runBehavior :: Behavior t a
  }

pure :: a -> Behavior t a
pure x = Behavior x (pure x)

main :: IO ()
main = play (InWindow "Nice Window" (200, 200) (800, 200))
            white
            30
            (pure (Circle 10))
            currentValue
            (\_ -> runBehavior)
            (\_ -> runBehavior)

If it's really a bug in gloss, then I should be able to reproduce it without any FRP-related stuff at all. Simplified as it is, a Behaviour t a is isomorphic to [a], and pure x isomorphic to repeat x. Can gloss handle infinite lists?

main :: IO ()
main = simulate (InWindow "Nice Window" (200, 200) (800, 200))
                white
                30
                [0..]
                (renderInt . head)
                (\_ _ -> tail)
Nope, it can't.

Okay, so this is an extremely simple gloss program exhibiting a clear memory leak, so even if I can't find the leak myself, I should be able to send this example program to the gloss developers as a bug report. Still, where does the profiler say that the issue is?

COST CENTRE   MODULE                                 %time %alloc

MAIN          MAIN                                    89.2  71.8
main          Main                                    6.7    0.0
CAF           Graphics.Rendering.(...).Functions      1.4    0.0
CAF           GHC.IO.Encoding                         1.2    0.0
callbackIdle  Graphics.Gloss.Internals.(...).Idle     0.6   13.7
drawComponent Graphics.Gloss.Internals.(...).Picture  0.3    9.5
fromList      Main                                    0.2    4.8

It looks like callbackIdle would be a good place to start. Looking at the code, this cost center is actually a function named callback_simulate_idle. Here is a slightly simplified version:

-- | Called when we're finished drawing
-- and it's time to do some computation.
callback_simulate_idle
 :: IORef SM.State  -- ^ the simulation state
 -> IORef world   -- ^ the current world
 -> world   -- ^ the initial world
 -> (Float -> world -> IO world) -- ^ advance the world
 -> IO ()
 
callback_simulate_idle simSR worldSR worldStart worldAdvance
 = {-# SCC "callbackIdle" #-}
   do simS <- readIORef simSR
 let result | SM.stateReset simS
            = simulate_reset simSR worldSR worldStart

            | SM.stateRun   simS
            = simulate_run   simSR worldSR worldAdvance
 
            | SM.stateStep  simS
            = simulate_step  simSR worldSR worldAdvance
 
            | otherwise
            = \_ -> return ()
 
 result

The memory is probably being allocated every time worldAdvance is called, but it's another value which catches my eye: worldStart? Why would the idle function need to remember the starting state? I see that it's being used when we want to "reset". I didn't know gloss allowed you to reset to the start state.

This reset feature explains the leak: in order to reset to the start state, gloss need to keep a pointer to the head of the infinite list, which prevents elements from being garbage-collected after being displayed.

Well, this is problematic. How to fix the leak while retaining the ability to reset the state? If the start state was obtained by calling a function, then the infinite lists returned by different invocations of this function would not be shared, thereby allowing the elements to be garbage-collected. But that's not the API gloss uses: play receive the start state directly, not as a wrapping function returning the start state. And creating our own function returning this constant state wouldn't help, because the function would hold a pointer to the head of the list, which would again prevent the garbage collection.

Anyway, I'm quite curious about this reset feature by now, so I look in the code for the key combination which triggers it, to no avail. In fact... it doesn't seem like the reset feature can be triggered at all? It's dead code! Along with plenty of other code which also looks dead. If fact, of the above four cases, only the simulate_run branch is ever taken.

This is weird, but it's actually great news for the memory leak. After removing all the dead code, the reference to worldStart is no longer kept alive after the first step, and the leak goes away.

The sweet plateau of fixed space leaks.
Conclusion
It can be hard to reason about a bug, be it a space leak or otherwise, when your assumptions are flawed. In this case, I had assumed that there was only one leak, when in fact there were two.

The key to uncovering incorrect assumptions is to run lots of sanity checks. In this case, after I had concluded that the leak had to be in gloss, I tested this hypothesis by removing gloss from the unsimplified version of my program, expecting the leak to disappear. When it didn't, I knew that one of my assumptions had to be flawed.

When that happens, one tedious but very effective weapon is to simplify the program: in a smaller program, there are fewer places for incorrect assumptions to hide. It also simplifies chains of reasoning, by not having lots of elements through which to chain.

No comments: