Zippers

This note about zippers follows Backtracking Iterators (Jean-Christophe Filliâtre). The paper has examples in OCaml but they translate to Haskell fairly directly. Literate Haskell source for this post is here: https://github.com/carlohamalainen/playground/tree/master/haskell/zipper. To run this file, first install QuickCheck:

cabal update
cabal install QuickCheck
> module Zipper where
>
> import Debug.Trace
> import Test.QuickCheck
> import Test.QuickCheck.Gen

A tree datatype

For our examples, we use a simple algebraic datatype, a balanced binary tree with integer labels for the nodes:

> data Tree = Empty | Node Tree Int Tree
>           deriving (Eq, Show)

Here is an example tree:

> tree :: Tree
> tree = Node
>          (Node Empty 1 Empty)
>          2
>          (Node Empty 3 (Node Empty 4 Empty))

We would normally draw this tree like this, with E for Empty:

      2
     / 
    /   
   1     3
  /    / 
 E   E E   4
          / 
         E   E

Think about traversing the tree. At the beginning there is no path – we are at the top of the tree. Otherwise we have gone down the left subtree or the right subtree.

If we went down the left branch at a node, we would have at hand the path that we followed to get to this node, the value at the node (an integer), and the tree on the right subtree that we did not visit.

Start at the top of the tree:

path: Top (haven't gone anywhere)

tree:
      2
     / 
    /   
   1     3
  /    / 
 E   E E   4
          / 
         E   E

Now walk down the left branch.

path: went left, have a 2, and the subtree
      to the right of us is
                                 3
                                / 
                               E   4
                                  / 
                                 E   E

we are focused on this subtree:

   1
  / 
 E   E

Encode this information in a type:

> data Path = Top                      -- ^ No path.
>           | WentLeft  Path Int Tree  -- ^ Followed the left subtree
>           | WentRight Tree Int Path  -- ^ Followed the right subtree
>           deriving (Eq, Show)

A zipper is a tree with a path.

> data Zipper = Zipper Tree Path
>             deriving (Eq, Show)

Working with zippers

The initial zipper is just the tree with no path.

> createZipper :: Tree -> Zipper
> createZipper t = Zipper t Top

Conversely, if we have a zipper and we are at the top, we can get the tree out of it.

> unZipper :: Zipper -> Tree
> unZipper (Zipper t Top) = t
> unZipper (Zipper t p)   = error $ "Can't unZipper here, path is " ++ show p ++ " with tree " ++ show t

Intuitively, we would expect that unZipper . createZipper = id, and we can check this using QuickCheck. First, provide an instance of Arbitrary for our binary trees:

> instance Arbitrary Tree where
>   arbitrary = frequency [ (1, return Empty) -- Empty
>                         , (1, arbNode)      -- Node   
>                         ]
>       where arbNode = do l <- arbitrary   -- 
>                          n <- arbitrary   -- 
>                          r <- arbitrary   -- 
>                          return $ Node l n r

Now the property unZipper . createZipper = id can be written as:

> prop_finish_createZipper t = unZipper (createZipper t) == t

Check it:

*Zipper> quickCheck prop_finish_create
+++ OK, passed 100 tests.

Looks good. Use verboseCheck prop_finish_create to see the values being generated.

Back to the zipper. Walking into the left subtree, as in the example above, involves moving the focus to the left subtree, and noting the node and the right subtree in the path component.

> goDownLeft :: Zipper -> Zipper
> goDownLeft (Zipper Empty        _) = error "Can't go down-left on an empty tree."
> goDownLeft (Zipper (Node l x r) p) = Zipper l (WentLeft p x r)

Going down the right subtree is similar:

> goDownRight :: Zipper -> Zipper
> goDownRight (Zipper Empty        _) = error "Can't go down-right on an empty tree."
> goDownRight (Zipper (Node l x r) p) = Zipper r (WentRight l x p)

Going up is the inverse of goDownLeft and goDownRight.

> goUp :: Zipper -> Zipper
> goUp (Zipper Empty Top)           = Zipper Empty Top
> goUp (Zipper l (WentLeft  p x r)) = Zipper (Node l x r) p
> goUp (Zipper r (WentRight l x p)) = Zipper (Node l x r) p

And we might want to go all the way up:

> unzipZipper :: Zipper -> Tree
> unzipZipper (Zipper t Top) = t
> unzipZipper z              = unzipZipper $ goUp z

Now we’d like to check with QuickCheck that going down an arbitrary path through a tree, then going all the way back up should bring us back to the same tree. So we will have to create random trees, paired with random paths through those trees. A tuple of type (Tree, Zipper) could work, but runs into dramas with overlapping instances since QuickCheck provides an instance for types, namely Arbitrary (a, b).

As a work-around, make a data type that holds a tree and a zipper:

> data TreeAndZipper = TreeAndZipper Tree Zipper
>   deriving (Eq, Show)

Here is the instance of Arbitrary:

> instance Arbitrary TreeAndZipper where
>   arbitrary = do t                   p                   return $ TreeAndZipper t p
>
>     where
>         arbPath z@(Zipper t p) = frequency [ (1, return z)    -- stop here
>                                            , (1, arbPath' z)  -- continue downwards
>                                            ]
>
>         arbPath' z@(Zipper Empty _) = return z
>         arbPath' z                  = frequency [ (1, arbPath $ goDownLeft  z)    -- go down left
>                                                 , (1, arbPath $ goDownRight z)    -- go down right
>                                                 , (1, return z)                   -- stop
>                                                 ]

Now with this instance we can encode the test that going down in a tree and then back up brings us back to the same tree.

> prop_zip_unzip :: TreeAndZipper -> Bool
> prop_zip_unzip (TreeAndZipper t z) = t == unzipZipper z

Check it:

*Zipper> quickCheck prop_zip_unzip
+++ OK, passed 100 tests.

Using verboseCheck we can see some of the values. Here is a sample:

(lots of output...)

TreeAndZipper (Node (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (-31) Empty) (Zipper (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (WentLeft Top (-31) Empty))
Passed:
TreeAndZipper (Node Empty (-33) Empty) (Zipper (Node Empty (-33) Empty) Top)
Passed:
TreeAndZipper Empty (Zipper Empty Top)
Passed:
TreeAndZipper (Node Empty (-95) Empty) (Zipper (Node Empty (-95) Empty) Top)
+++ OK, passed 100 tests.

Traversals with a zipper

A nifty thing about zippers is that we can use them to step through a traversal, controlling the process programatically. If we are walking through a tree, we might be finished, or we have produced a value (an Int) but need to keep going through the zipper:

> data Step = Finished
>           | KeepGoing Int Zipper
>           deriving Show

The step function converts a zipper into this state (step) type:

> step :: Zipper -> Step

If we have an empty tree and no path, we are done.

> step (Zipper Empty Top) = Finished

If we have gone down-left, make note of the node’s value x and the rest of the zipper:

> step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)

Otherwise, we have a tree and a path, so try to continue by going down-left:

> step (Zipper t p) = step $ goDownLeft (Zipper t p)

In summary:

> step :: Zipper -> Step
> step (Zipper Empty Top)               = Finished
> step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)
> step (Zipper t p)                     = step $ goDownLeft (Zipper t p)

By repeatedly applying step we get an inorder traversal of the tree:

> inorder :: Tree -> [Int]
> inorder t = runStep (step (Zipper t Top)) []
>   where
>     runStep :: Step -> [Int] -> [Int]
>     runStep Finished                    acc = acc
>     runStep (KeepGoing x (Zipper t' p)) acc = runStep (step (Zipper t' p)) (acc ++ [x])

(As an aside, runStep is tail recursive.)

Using inorder on our example tree:

*Zipper> inorder tree
[1,2,3,4]

Here is a plain recursive definition of an inorder traversal:

> inorder' :: Tree -> [Int]
> inorder' Empty = []
> inorder' (Node l x r) = inorder' l ++ [x] ++ inorder' r

We can use this to verify that our fancy zipper inorder traversal is correct:

> prop_inorder :: Tree -> Bool
> prop_inorder t = inorder t == inorder' t

Testing it:

*Zipper> quickCheck prop_inorder
+++ OK, passed 100 tests.

If we want to do something different in the traversal, for example running a monadic action, we can use the same Step datatype and change the definition of runStep:

> inorderM :: Monad m => (Int -> m a) -> Tree -> m ()
> inorderM a t = runStepM a $ step (Zipper t Top)
>   where
>     runStepM :: Monad m => (Int -> m a) -> Step -> m ()
>     runStepM _ Finished                    = return ()
>     runStepM a (KeepGoing x (Zipper t' p)) = (a x) >> (runStepM a $ step (Zipper t' p))

Example usage:

*Zipper> inorderM (x -> putStrLn $ "Node value: " ++ show x) tree
Node value: 1
Node value: 2
Node value: 3
Node value: 4

Mapping over a tree

If we want to apply a function to each value in a tree, a recursive definition might be:

> mapTree :: (Int -> Int) -> Tree -> Tree
> mapTree _ Empty = Empty
> mapTree f (Node l x r) = Node (mapTree f l) (f x) (mapTree f r)
*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

We can check that mapTree id == mapTree:

> prop_maptree :: Tree -> Bool
> prop_maptree t = t == (mapTree id t)
*Zipper> quickCheck prop_maptree
+++ OK, passed 100 tests.

We can also use a zipper to map over the tree by using a different data type to represent the stepping:

> data MapStep = MapFinished
>              | MoreL Int Zipper
>              | More2 Zipper Int Zipper
>              deriving Show
> stepMap :: (Int -> Int) -> Zipper -> MapStep
> stepMap _ (Zipper Empty Top              ) = MapFinished
> stepMap f (Zipper Empty (WentLeft  p x r)) = MoreL (f x) (Zipper r p)
> stepMap f (Zipper (Node l x r) p)          = More2 (Zipper l p) (f x) (Zipper r p)
> mapTree' :: (Int -> Int) -> Tree -> Tree
> mapTree' f t = runStep (stepMap f $ Zipper t Top)
>   where
>     runStep :: MapStep -> Tree
>     runStep MapFinished     = Empty
>     runStep (MoreL x z)     = Node Empty x (runStep $ stepMap f z)
>     runStep (More2 zl x zr) = Node (runStep $ stepMap f zl) x (runStep $ stepMap f zr)

Testing it:

*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree' (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

And testing it using QuickCheck:

> prop_maptree' :: Tree -> Bool
> prop_maptree' t = (mapTree (+1) t) == (mapTree' (+1) t)
*Zipper> quickCheck prop_maptree'
+++ OK, passed 100 tests.

The Main.hs file runs all the tests:

$ ghc --make Main.hs
[1 of 2] Compiling Zipper           ( Zipper.lhs, Zipper.o )
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Linking Main ...

$ ./Main
prop_finish_createZipper
+++ OK, passed 100 tests.
prop_inorder
+++ OK, passed 100 tests.
prop_maptree
+++ OK, passed 100 tests.
prop_maptree'
+++ OK, passed 100 tests.
prop_zip_unzip
+++ OK, passed 100 tests.