Zippers

This note about zippers follows Backtracking Iterators (Jean-Christophe Filliâtre). The paper has examples in OCaml but they translate to Haskell fairly directly. Literate Haskell source for this post is here: playground/tree/master/haskell/zipper. To run this file, first install QuickCheck:

cabal update
cabal install QuickCheck

module Zipper where

import Debug.Trace
import Test.QuickCheck
import Test.QuickCheck.Gen

A tree datatype Link to heading

For our examples, we use a simple algebraic datatype, a balanced binary tree with integer labels for the nodes:

data Tree = Empty | Node Tree Int Tree
          deriving (Eq, Show)

Here is an example tree:

tree :: Tree
tree = Node
         (Node Empty 1 Empty)
         2
         (Node Empty 3 (Node Empty 4 Empty))

We would normally draw this tree like this, with E for Empty:

      2
     / \
    /   \
   1     3
  / \   / \
 E   E E   4
          / \
         E   E

Think about traversing the tree. At the beginning there is no path - we are at the top of the tree. Otherwise we have gone down the left subtree or the right subtree.

If we went down the left branch at a node, we would have at hand the path that we followed to get to this node, the value at the node (an integer), and the tree on the right subtree that we did not visit.

Start at the top of the tree:

path: Top (haven't gone anywhere)

tree:
      2
     / \
    /   \
   1     3
  / \   / \
 E   E E   4
          / \
         E   E

Now walk down the left branch.

path: went left, have a 2, and the subtree
      to the right of us is
                                 3
                                / \
                               E   4
                                  / \
                                 E   E

we are focused on this subtree:

   1
  / \
 E   E

Encode this information in a type:

data Path = Top                      -- ^ No path.
          | WentLeft  Path Int Tree  -- ^ Followed the left subtree
          | WentRight Tree Int Path  -- ^ Followed the right subtree
          deriving (Eq, Show)

A zipper is a tree with a path.

data Zipper = Zipper Tree Path
            deriving (Eq, Show)

Working with zippers Link to heading

The initial zipper is just the tree with no path.

createZipper :: Tree -> Zipper
createZipper t = Zipper t Top

Conversely, if we have a zipper and we are at the top, we can get the tree out of it.

unZipper :: Zipper -> Tree
unZipper (Zipper t Top) = t
unZipper (Zipper t p)   = error $ "Can't unZipper here, path is " ++ show p ++ " with tree " ++ show t

Intuitively, we would expect that unZipper . createZipper = id, and we can check this using QuickCheck. First, provide an instance of Arbitrary for our binary trees:

instance Arbitrary Tree where
  arbitrary = frequency [ (1, return Empty) -- Empty
                        , (1, arbNode)      -- Node <left> <n> <right>
                        ]
      where arbNode = do l <- arbitrary   -- <left>
                         n <- arbitrary   -- <n>
                         r <- arbitrary   -- <right>
                         return $ Node l n r

Now the property unZipper . createZipper = id can be written as:

prop_finish_createZipper t = unZipper (createZipper t) == t

Check it:

*Zipper> quickCheck prop_finish_create
+++ OK, passed 100 tests.

Looks good. Use verboseCheck prop_finish_create to see the values being generated.

Back to the zipper. Walking into the left subtree, as in the example above, involves moving the focus to the left subtree, and noting the node and the right subtree in the path component.

goDownLeft :: Zipper -> Zipper
goDownLeft (Zipper Empty        _) = error "Can't go down-left on an empty tree."
goDownLeft (Zipper (Node l x r) p) = Zipper l (WentLeft p x r)

Going down the right subtree is similar:

goDownRight :: Zipper -> Zipper
goDownRight (Zipper Empty        _) = error "Can't go down-right on an empty tree."
goDownRight (Zipper (Node l x r) p) = Zipper r (WentRight l x p)

Going up is the inverse of goDownLeft and goDownRight.

goUp :: Zipper -> Zipper
goUp (Zipper Empty Top)           = Zipper Empty Top
goUp (Zipper l (WentLeft  p x r)) = Zipper (Node l x r) p
goUp (Zipper r (WentRight l x p)) = Zipper (Node l x r) p

And we might want to go all the way up:

unzipZipper :: Zipper -> Tree
unzipZipper (Zipper t Top) = t
unzipZipper z              = unzipZipper $ goUp z

Now we’d like to check with QuickCheck that going down an arbitrary path through a tree, then going all the way back up should bring us back to the same tree. So we will have to create random trees, paired with random paths through those trees. A tuple of type (Tree, Zipper) could work, but runs into dramas with overlapping instances since QuickCheck provides an instance for types, namely Arbitrary (a, b).

As a work-around, make a data type that holds a tree and a zipper:

data TreeAndZipper = TreeAndZipper Tree Zipper
  deriving (Eq, Show)

Here is the instance of Arbitrary:

instance Arbitrary TreeAndZipper where
  arbitrary = do t <- arbitrary                 -- an arbitrary tree t
                 p <- arbPath $ createZipper t  -- an arbitrary path in t
                 return $ TreeAndZipper t p

    where
        arbPath z@(Zipper t p) = frequency [ (1, return z)    -- stop here
                                           , (1, arbPath' z)  -- continue downwards
                                           ]

        arbPath' z@(Zipper Empty _) = return z
        arbPath' z                  = frequency [ (1, arbPath $ goDownLeft  z)    -- go down left
                                                , (1, arbPath $ goDownRight z)    -- go down right
                                                , (1, return z)                   -- stop
                                                ]

Now with this instance we can encode the test that going down in a tree and then back up brings us back to the same tree.

prop_zip_unzip :: TreeAndZipper -> Bool
prop_zip_unzip (TreeAndZipper t z) = t == unzipZipper z

Check it:

*Zipper> quickCheck prop_zip_unzip
+++ OK, passed 100 tests.

Using verboseCheck we can see some of the values. Here is a sample:

(lots of output...)

TreeAndZipper (Node (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (-31) Empty) (Zipper (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (WentLeft Top (-31) Empty))
Passed:
TreeAndZipper (Node Empty (-33) Empty) (Zipper (Node Empty (-33) Empty) Top)
Passed:
TreeAndZipper Empty (Zipper Empty Top)
Passed:
TreeAndZipper (Node Empty (-95) Empty) (Zipper (Node Empty (-95) Empty) Top)
+++ OK, passed 100 tests.

Traversals with a zipper Link to heading

A nifty thing about zippers is that we can use them to step through a traversal, controlling the process programatically. If we are walking through a tree, we might be finished, or we have produced a value (an Int) but need to keep going through the zipper:

data Step = Finished
          | KeepGoing Int Zipper
          deriving Show

The step function converts a zipper into this state (step) type:

step :: Zipper -> Step

If we have an empty tree and no path, we are done.

step (Zipper Empty Top) = Finished

If we have gone down-left, make note of the node’s value x and the rest of the zipper:

step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)

Otherwise, we have a tree and a path, so try to continue by going down-left:

step (Zipper t p) = step $ goDownLeft (Zipper t p)

In summary:

step :: Zipper -> Step
step (Zipper Empty Top)               = Finished
step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)
step (Zipper t p)                     = step $ goDownLeft (Zipper t p)

By repeatedly applying step we get an inorder traversal of the tree:

inorder :: Tree -> [Int]
inorder t = runStep (step (Zipper t Top)) []
  where
    runStep :: Step -> [Int] -> [Int]
    runStep Finished                    acc = acc
    runStep (KeepGoing x (Zipper t' p)) acc = runStep (step (Zipper t' p)) (acc ++ [x])

(As an aside, runStep is tail recursive.)

Using inorder on our example tree:

*Zipper> inorder tree
[1,2,3,4]

Here is a plain recursive definition of an inorder traversal:

inorder' :: Tree -> [Int]
inorder' Empty = []
inorder' (Node l x r) = inorder' l ++ [x] ++ inorder' r

We can use this to verify that our fancy zipper inorder traversal is correct:

prop_inorder :: Tree -> Bool
prop_inorder t = inorder t == inorder' t

Testing it:

*Zipper> quickCheck prop_inorder
+++ OK, passed 100 tests.

If we want to do something different in the traversal, for example running a monadic action, we can use the same Step datatype and change the definition of runStep:

inorderM :: Monad m => (Int -> m a) -> Tree -> m ()
inorderM a t = runStepM a $ step (Zipper t Top)
  where
    runStepM :: Monad m => (Int -> m a) -> Step -> m ()
    runStepM _ Finished                    = return ()
    runStepM a (KeepGoing x (Zipper t' p)) = (a x) >> (runStepM a $ step (Zipper t' p))

Example usage:

*Zipper> inorderM (\x -> putStrLn $ "Node value: " ++ show x) tree
Node value: 1
Node value: 2
Node value: 3
Node value: 4

Mapping over a tree Link to heading

If we want to apply a function to each value in a tree, a recursive definition might be:

mapTree :: (Int -> Int) -> Tree -> Tree
mapTree _ Empty = Empty
mapTree f (Node l x r) = Node (mapTree f l) (f x) (mapTree f r)

*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

We can check that mapTree id == mapTree:

prop_maptree :: Tree -> Bool
prop_maptree t = t == (mapTree id t)

*Zipper> quickCheck prop_maptree
+++ OK, passed 100 tests.

We can also use a zipper to map over the tree by using a different data type to represent the stepping:

data MapStep = MapFinished
             | MoreL Int Zipper
             | More2 Zipper Int Zipper
             deriving Show

stepMap :: (Int -> Int) -> Zipper -> MapStep
stepMap _ (Zipper Empty Top              ) = MapFinished
stepMap f (Zipper Empty (WentLeft  p x r)) = MoreL (f x) (Zipper r p)
stepMap f (Zipper (Node l x r) p)          = More2 (Zipper l p) (f x) (Zipper r p)

mapTree' :: (Int -> Int) -> Tree -> Tree
mapTree' f t = runStep (stepMap f $ Zipper t Top)
  where
    runStep :: MapStep -> Tree
    runStep MapFinished     = Empty
    runStep (MoreL x z)     = Node Empty x (runStep $ stepMap f z)
    runStep (More2 zl x zr) = Node (runStep $ stepMap f zl) x (runStep $ stepMap f zr)

Testing it:

*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree' (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

And testing it using QuickCheck:

prop_maptree' :: Tree -> Bool
prop_maptree' t = (mapTree (+1) t) == (mapTree' (+1) t)

*Zipper> quickCheck prop_maptree'
+++ OK, passed 100 tests.

The Main.hs file runs all the tests:

$ ghc --make Main.hs
[1 of 2] Compiling Zipper           ( Zipper.lhs, Zipper.o )
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Linking Main ...

$ ./Main
prop_finish_createZipper
+++ OK, passed 100 tests.
prop_inorder
+++ OK, passed 100 tests.
prop_maptree
+++ OK, passed 100 tests.
prop_maptree'
+++ OK, passed 100 tests.
prop_zip_unzip
+++ OK, passed 100 tests.