A short note about using parallel-io to run shell commands in parallel from Haskell. If you want to try out this blog post’s Literate Haskell source then your best bet is to compile in a sandbox which has various package versions fixed using the cabal.config file (via the cabal freeze command).

This is how to build the sandbox:

git clone https://github.com/carlohamalainen/playground.git
rm -fr .cabal-sandbox cabal.sandbox.config dist # start fresh
cabal sandbox init
cabal install
cabal repl


Also, note the line

  ghc-options:         -threaded -rtsopts -with-rtsopts=-N


in parallel.cabal. Without those rtsopts options you would have to execute the binary using ./P +RTS -N.

Now, onto the actual blog post. First, a few imports to get us going.

In one of my work projects I often need to call legacy command line tools to process various imaging formats (DICOM, MINC, Nifti, etc). I used to use a plain call to createProcess and then readRestOfHandle to read the stdout and stderr but I discovered that it can deadlock and a better approach is to use process-streaming.

This is the current snippet that I use:

Suppose we have a shell command that takes a while, in this case because it’s sleeping. Pretend that it’s IO bound.

We could run them in order:

In Haskell we can think of IO as a data type that describes an IO action, so we can build it up using ‘pure’ code and then execute them later. To make it a bit more explicit, here is a function for running an IO action:

We can use it like this:

*Main> let action = print 3 -- pure code, nothing happens yet
*Main> runIO action         -- runs the action
3


And we can rewrite main1 like this:

As an aside, runIO is equivalent to liftM id (see Control.Monad for info about liftM).

Now, imagine that you had a lot of these shell commands to execute and wanted a pool of, say, 4 workers. The parallel-io package provides withPool which can be used like this:

Note that the IO actions (the putStrLn fragments) are provided in a list. A list of IO actions. So we can run our shell commands in parallel like so:

If we did this a lot we might define our own version of forM_ that uses withPool:

Here is another example of building up some IO actions in pure form and then executing them later. Imagine that instead of a list of Ints for the sleep times, we have some actual sleep times and others that represent an error case. An easy way to model this is using Either, which by convention has the erroneous values in the Left and correct values in the Right.

In main5 we define actions by mapping a function over the sleep times, which are are now of type Either String Int. We can’t apply longShellCommand directly because it expects an Int, so we use traverse longShellCommand instead (see Data.Traversable for the definition of traverse).

Next, the Either-of-Either is a bit clunky but we can mash them together using join. Here we have to use fmap because we have list elements of type IO (Either [Char] String), not Either [Char] String as join might expect.

One topic that I haven’t touched on is dealing with asynchronous exceptions. For this, have a read of Catching all exceptions from Snoyman and also enclosed-exceptions. Also, Chapter 13 of Parallel and Concurrent Programming in Haskell shows how to use the handy async package.