Note to self about mapM
. Is it lazy? Sort of.
Literate source is here: https://github.com/carlohamalainen/playground/tree/master/haskell/mapm.
First, some imports:
{-# LANGUAGE OverloadedStrings, InstanceSigs #-}
import Control.Applicative
import Control.Monad
import qualified Data.ByteString as B
import Data.ByteString.Internal (w2c)
import Data.Either
I recently wrote some code using wreq
that seemed to use much more memory than I thought it should. The problem turned out not to be
with wreq but with the way that I was using mapM
. An equivalent snippet of code is:
main1 = do
firstFile <- head <$> mapM B.readFile (take 100000 $ repeat "MapM.lhs")
print $ B.length firstFile
I reasoned that mapM
would construct its result lazily, then head
would force evaluation of just the first element of the list. This isn’t the case, as explained here. The function mapM
is basically equivalent to this:
mapM' :: Monad m => (a -> m b) -> [a] -> m [b]
mapM' m [] = return []
mapM' m (x:xs) = do
x' <- m x
xs' <- mapM' m xs
return (x':xs')
So the monadic action m
is evaluated to build up the list elements.
One of the answers on the StackOverflow page says to use a step by step series to only evaluate the bits that are required:
data Stream m a = Nil | Stream a (m (Stream m a))
GHC 7.8.3 comes with Stream defined as:
-- In GHC 7.8.3:
newtype Stream m a b = Stream { runStream :: m (Either b (a, Stream m a b)) }
The idea is that it represents a sequence of monadic actions. A Left
is a final value of type b
, while Right (a, Stream m a b)
represents an intermediate value of type a
along with the remaining stream.
The Monad
instance is fairly straightforward. The return
function turns a plain value into a final value (hence the Left
), and the bind either stops with the final value or produces the new value along with the next stream.
instance Monad m => Monad (Stream m a) where
return a = Stream $ return $ Left a
Stream m >>= k = Stream $ do
r <- m
case r of
Left b -> runStream $ k b
Right (a, str) -> return $ Right (a, str >>= k)
There are also instances for Functor
and Applicative
but we don’t need them here.
A handy function is liftIO
which turns a normal monadic action into a stream:
liftIO :: IO a -> Stream IO b a
liftIO io = Stream $ io >>= return . Left
It just runs the io
action, and pipes it to a Left
and then returns it in a Stream
.
readFileS :: FilePath -> Stream IO b B.ByteString
readFileS f = liftIO $ B.readFile f
To use readFileS
we wrap it with runStream
:
*Main> Left x <- runStream $ readFileS "MapM.lhs"
*Main> print $ B.length x
4243
So we can produce final values, but what about intermediate ones? This is what yield
does:
yield :: Monad m => a -> Stream m a ()
yield a = Stream $ return $ Right $ (a, return ())
At this point we have no idea about the remaining stream, so we return the unit ()
.
For testing the code here we’ll take the definition of collect
from Stream as well. It just walks
through the entire Stream and collects the values, ignoring the final unit value.
collect :: Monad m => Stream m a () -> m [a]
collect str = go str []
where
go str acc = do
r <- runStream str
case r of
Left () -> return (reverse acc)
Right (a, str') -> go str' (a:acc)
Now we can try out yield
using monadic notation:
yield123 :: Stream IO Int ()
yield123 = do
yield 1
yield 2
yield 3
*Main> collect yield123
[1,2,3]
We can mix normal Haskell control structures like if/then/else into the monadic notation:
yieldEvens :: Int -> Stream IO Int ()
yieldEvens n = if n > 10
then return ()
else do yield n
yieldEvens $ n + 2
*Main> collect $ yieldEvens 0
[0,2,4,6,8,10]
We could read some files using our readFileS
function and yield the results:
readAFewFiles :: Stream IO B.ByteString ()
readAFewFiles = do
readFileS "MapM.lhs" >>= yield
readFileS "MapM.lhs" >>= yield
readFileS "MapM.lhs" >>= yield
readFileS "MapM.lhs" >>= yield
readFileS "MapM.lhs" >>= yield
*Main> length <$> collect readAFewFiles
5
We can generalise this to apply a monadic function to a list of arguments, which is basically what mapM
does:
streamMapM :: (String -> IO B.ByteString) -> [String] -> Stream IO B.ByteString ()
streamMapM _ [] = return ()
streamMapM f (a:as) = do
(liftIO $ f a) >>= yield
streamMapM f as
And we can even make an infinite stream:
readForever :: Stream IO B.ByteString ()
readForever = streamMapM B.readFile (repeat "MapM.lhs")
Take from a stream and a definition of head for a stream:
takeStream :: Integer -> Stream IO a () -> IO [a]
takeStream n str = go str [] n
where
go str acc n = do
if n <= 0 then return acc
else do r <- runStream str
case r of
Left () -> return (reverse acc)
Right (a, str') -> go str' (a:acc) (n - 1)
headStream :: Stream IO a () -> IO (Maybe a)
headStream str = do
h <- takeStream 1 str
return $ case h of
[h'] -> Just h'
_ -> Nothing
So we can efficiently take the head of the stream without evaluating the entire thing:
*Main> (fmap B.length) <$> headStream readForever
Just 5917
I should point out that the example of reading a file a bunch of times could be achieved without Stream
just by storing a list of the monadic actions, and then evaluating the one that we want:
listOfActions :: [IO B.ByteString]
listOfActions = repeat $ B.readFile "MapM.lhs"
which can be used as follows:
*Main> B.length <$> (head $ listOfActions)
6455
The difference is that the list is somewhat static, in that we can’t mix control structures into it as we can do with Stream
.
Interestingly, the definition for Stream
looks very similar to the definition for Free
, which I used in an earlier post about free monads:
data Stream m a = Nil | Stream a (m (Stream m a))
data Free f r = MkPure r | MkFree (f (Free f r))
Here’s one way to encode Stream
-like behaviour using free monads. I define two actions, yield and final. The yield action stores an input value of type a
, a monadic function a -> IO b
, and the rest of the structure, which turns out to be conveniently represented as a function b -> k
. Being a function of b
lets the rest of the structure depend on the result at the current node in the structure. The final action just stores the value and monadic action, and is a terminal node in the free monad.
data StreamF a b k = Yield a (a -> IO b) (b -> k)
| Final a (a -> IO b)
For convenience, Command
is a simpler type signature:
type Command a b k = Free (StreamF a b) k
As in my earlier post, we need instances for Functor
and Monad
. They are fairly straightforward:
instance Functor (StreamF a b) where
fmap f (Yield a io k) = Yield a io (f . k)
fmap _ (Final a io) = Final a io
instance (Functor f) => Monad (Free f) where
return :: a -> Free f a
return x = MkPure x
(>>=) :: Free f a -> (a -> Free f b) -> Free f b
(MkFree x) >>= h = MkFree $ fmap (\q -> q >>= h) x
(MkPure r) >>= f = f r
Here are two helpers to make Command
’s monadic usage easier:
-- Lift an IO action to a final Command.
finalF :: a -> (a -> IO b) -> Command a b r
finalF a io = MkFree $ Final a io
-- Lift an IO action to a Command that yields the value
-- and continues.
yieldF :: a -> (a -> IO b) -> Command a b b
yieldF a io = MkFree $ Yield a io (\b -> MkPure b)
To run a Command
we walk its structure recursively and
run the IO actions as needed:
runCommand :: (Show a, Show b, Show r) => Command a b r -> IO ()
runCommand (MkFree (Final a io)) = do
putStrLn $ "Final " ++ show a
x <- io a
putStrLn $ "Produced the value: " ++ show x
runCommand (MkFree (Yield a io next)) = do
b <- io a
putStrLn $ "Yield: computed value: " ++ show b
runCommand (next b)
runCommand (MkPure x) = putStrLn $ "MkPure: " ++ show x
As with Stream
, we can mix control structures with the creation of the free monad:
exampleCommand :: Command FilePath String String
exampleCommand = do
x <- yieldF "hello1.txt" readFile
y <- if x == "hello1\n"
then yieldF "hello2.txt" readFile
else finalF "hello3.txt" readFile
return y
For example:
Yield: computed value: "hello1\n"
Yield: computed value: "hello2\n"
MkPure: "hello2\n"
Taking the head of a Command
is straightforward using the definition of runCommand
:
headCommand :: Command a r r -> IO r
headCommand (MkFree (Final a io )) = io a
headCommand (MkFree (Yield a io _)) = io a
headCommand (MkPure x) = return x
Here it is in action:
*Main> :t headCommand exampleCommand
headCommand exampleCommand :: IO String
*Main> headCommand exampleCommand
"hello1\n"
To finish things off, here are versions of take
and mapM
on Command
:
runOneCommand :: Command t t () -> IO (Either () (t, Command t t ()))
runOneCommand (MkFree (Final a io)) = do
x <- io a
return $ Right (x, MkPure ())
runOneCommand (MkFree (Yield a io next)) = do
b <- io a
return $ Right (b, next b)
runOneCommand (MkPure ()) = Left <$> return ()
takeCommand :: Integer -> Command t t () -> IO [t]
takeCommand n str = go str [] n
where
go str acc n = do
if n <= 0 then return acc
else do r <- runOneCommand str
case r of
Left () -> return $ reverse acc
Right (a, str') -> go str' (a:acc) (n - 1)
commandMapM :: (a -> IO a) -> [a] -> Command a a ()
commandMapM _ [] = MkPure ()
commandMapM f (a:as) = do
yieldF a f
commandMapM f as
It works like the Stream
example:
takeCommandExample = (fmap B.length) <$> (takeCommand 3 $ commandMapM readFileBB (take 100000 $ repeat "MapM.lhs")) >>= print
where
-- Since B.readFile :: String -> B.ByteString
-- we have to write this wrapper so that the input
-- and result types match, as required by the
-- restriction "Command t t ()" in the signature
-- for takeCommand.
readFileBB :: B.ByteString -> IO B.ByteString
readFileBB = B.readFile . (map w2c) . B.unpack
There we go:
*Main> takeCommandExample
[11241,11241,11241]