Spectral graph partitioning in Haskell

I was reading some notes from QUT about using spectral graph theory to partition a graph using eigenvalues of unnormalised Laplacian. Their Matlab code looks like this:

% Form W = Gaussian distribution based on distances
W = zeros(100, 100);
D = zeros(100, 100);

sigma = 2;
N = 100;
for i = 1:N
    for j = 1:N
        if (j ~= i)
            % Calculate the distance between two points
            dist = norm([A(i, 1) - A(j, 1); A(i, 2) - A(j, 2)]);
            expp = exp(-dist^2 / (2 * sigma^2));
            adjacency(i, j) = 1;
            % Add the weights to the matrix
            W(i, j) = expp;
            % Add up the row sum as we go
            D(i, i) = D(i, i) + expp;

L = D - W;

Find the eigenpairs of L

[vec, val] = eig(L, 'vector');
% Ensure the eigenvalues and eigenvectors are sorted in ascending order
[val,ind] = sort(val);
vec = vec(:, ind);

% Plot with clusters identified by marker
% v2
hold on;
for i = 1:N
    plot(i, vec(i, 2), ['k' plot_syms{i}]);

Personally, this gives me nightmares from undergrad numerical methods classes. So here’s how to do it in Haskell. Full source (including cabal config) for this post is here.

> module Notes where

To create the W matrix we use buildMatrix:

> buildW :: Double -> Matrix Double -> Matrix Double
> buildW sigma a = buildMatrix n n $ (i,j) -> f sigma a i j
>   where
>     n = rows a
>     f sigma a i j = if j /= i
>                         then expp sigma a i j
>                         else 0.0
>     dist :: Matrix Double -> Int -> Int -> Double
>     dist m i j = norm2 $ fromList [ m!i!0 - m!j!0, m!i!1 - m!j!1 ]
>     expp :: Double -> Matrix Double -> Int -> Int -> Double
>     expp sigma m i j = exp $ (-(dist m i j)**2)/(2*sigma**2)

The D matrix is a diagonal matrix with each element being the sum of a row of W, which is nicely expressible by composing a few functions:

> buildD :: Matrix Double -> Matrix Double
> buildD w = diag
>          . fromList
>          . map (foldVector (+) 0)
>          . toRows
>          $ w
>   where
>     n = rows w

The L matrix is real and symmetric so we use eigSH which provides the eigenvalues in descending order.

> lapEigs :: Double -> Matrix Double -> (Vector Double, Matrix Double)
> lapEigs sigma m = eigSH l
>   where
>     w = buildW sigma m
>     d = buildD w
>     l = d - w

To finish up, the Fiedler eigenvector corresponds to the second smallest eigenvalue:

> fiedler :: Double -> Matrix Double -> (Double, Vector Double)
> fiedler sigma m = (val ! (n-2), vector $ concat $ toLists $ vec ¿ [n-2])
>   where
>     (val, vec) = lapEigs sigma m
>     n = rows m

To plot the eigenvalues and eigenvector we use the Chart library which uses the cairo backend.

> doPlot "eigenvalues.png" "Eigenvalues" "eigenvalue" $ zip [0..] (reverse $ toList val)
> doPlot "fiedler.png"
>        "Second eigenvalue of unnormalised Laplacian"
>        "fiedler eigenvector"
>        (zip [0..] $ toList algConnecEigVec)

ghc-imported-from => ghc-mod

This post has some errors; see ghc-imported-from–ghc-mod-march-2017 for the latest instructions.

I have a pull request to merge ghc-imported-from into ghc-mod. The main benefit of being part of ghc-mod is that I don’t have to duplicate ghc-mod’s infrastructure for handling sandboxes, GHC options, interfaces to other build tools like Stack, and compatibility with more versions of GHC.

The pull request is still under review, so until then you can try it out by cloning the development branches:

git clone -b imported-from https://github.com/DanielG/ghc-mod.git ghc-mod-imported-from
cd ghc-mod-imported-from
cabal update && cabal sandbox init && cabal install
export PATH=`pwd`/.cabal-sandbox/bin:$PATH

Assuming that you use Plugged for managing
Vim/Neovim plugins, also grab my branch of ghcmod-vim:

git clone -b ghcmod-imported-from-cmd git@github.com:carlohamalainen/ghcmod-vim.git $HOME/.vim/plugged/ghcmod-vim

and make sure that the plugin is enabled:

call plug#begin('~/.vim/plugged')
Plug 'eagletmt/ghcmod-vim', {'for': 'haskell'}

and set some handy key mappings:

au FileType  haskell nnoremap            :GhcModType
au FileType  haskell nnoremap            :GhcModInfo
au FileType  haskell nnoremap    :GhcModTypeClear

au FileType lhaskell nnoremap            :GhcModType
au FileType lhaskell nnoremap            :GhcModInfo
au FileType lhaskell nnoremap    :GhcModTypeClear

au FileType haskell  nnoremap   :GhcModOpenDoc
au FileType lhaskell nnoremap   :GhcModOpenDoc

au FileType haskell  nnoremap   :GhcModDocUrl
au FileType lhaskell nnoremap   :GhcModDocUrl

au FileType haskell  vnoremap  : GhcModOpenHaddockVismode
au FileType lhaskell vnoremap  : GhcModOpenHaddockVismode

au FileType haskell  vnoremap  : GhcModEchoUrlVismode
au FileType lhaskell vnoremap  : GhcModEchoUrlVismode

On the command line, use the imported-from command. It tells you the defining module, the exporting module, and the Haddock URL:

$ ghc-mod imported-from Foo.hs 9 34 show
base- Prelude https://hackage.haskell.org/package/base-

From Vim/Neovim, navigate to a symbol and hit F4 which will open the Haddock URL in your browser, or F5 to echo the command-line output.

Reflex FRP gallery editor

When I post a series of photos to a personal blog I find myself editing HTML in Vim and switching back and forth to a browser to see if I have written comments in the right places and ordered the photos correctly. I could use a HTML editor to do this, but why not try FRP with Haskell? 🙂 Apparently I sort of use FRP at work so trying out Reflex wasn’t too much of a leap.

This post is adapted from the todo list and drag ‘n’ drop examples in this Reflex repo.

Let’s start with the types. My basic blob of data is an image (a URL), a comment about the image, and a flag to indicate if the image should appear in the final output:

> data Image = Image
>     { imageFileName :: T.Text   -- ^ e.g. "file:///tmp/cat.jpg"
>     , imageVisible  :: Bool     -- ^ Output in HTML render?
>     , imageRemark   :: T.Text   -- ^ Comment that goes before the image.
>     }
>     deriving (Eq, Ord, Show)

The images have to be rendered in a particular order, so we’ll use a Map

> Map Int a

where the integer keys provide the ordering and a is some type.

To toggle visibility in the final rendering, we flip imageVisible:

> toggleVisibility :: Int -> Map Int Image -> Map Int Image
> toggleVisibility k m = M.adjust f k m
>   where
>     f (Image x b c) = Image x (not b) c

We can set the description for an image:

> setDesc :: (Int, T.Text) -> Map Int Image -> Map Int Image
> setDesc (k, c) m = M.adjust f k m
>   where
>     f (Image x b _) = Image x b c

We can move the kth image up:

> moveUp :: Int -> Map Int Image -> Map Int Image
> moveUp 0 m = m
> moveUp k m
>   = let xs = M.elems m in
>     M.fromList $ zip [0..] $ take (k-1) xs ++ [xs !! k, xs !! (k-1)] ++ drop (k+1) xs
>     -- ^^^ Assumes contiguous keys!

and down:

> moveDown :: Int -> Map Int Image -> Map Int Image
> moveDown k m
>   | k == fst (M.findMax m) = m
>   | otherwise = let xs = M.elems m in
>       M.fromList $ zip [0..] $ take k xs ++ [xs !! (k+1), xs !! k] ++ drop (k+2) xs

It’s not efficient to completely rebuild the map by converting it to a list and back again, but this’ll do for now.

In terms of the user interface there are a few events to consider:

  • user toggles visibility of the kth image;
  • user moves the kth image up;
  • user moves the kth image down; and
  • user changes the comment text for the kth image.

We’ll put these four events into our own type. The first three are of type Event t Int where the Int is the key for the image in question. The last one has type Event t (Int, T.Text) since we need the key and the text that was entered into the textbox. In Reflex, the event type is Event.

> data ImageEvent t = ImageEvent
>     { evToggle  :: Event t Int
>     , evUp      :: Event t Int
>     , evDown    :: Event t Int
>     , evKey     :: Event t (Int, T.Text)
>     }

Next, imageW creates an unnumbered list of images, consisting of a text field indicating if the image will be visible; a text box for writing a comment; buttons to toggle visibility and move the image up and down; and finally the image itself.

> imageW
>     :: forall m t. (MonadWidget t m)
>     => Dynamic t (Map Int Image)
>     -> m (Dynamic t (Map Int (ImageEvent t)))
> imageW xs = elClass "ul" "list" $ listWithKey xs $ k x -> elClass "li" "element" $ do
>     dynText $ fmap (T.pack . show . imageVisible) x
>     el "br" $ return ()
>     let xEvent = imageRemark  uniqDyn x
>     ti 
>     tEvent <- updated  return (zipDynWith (,) (constDyn k) (_textInput_value ti))
>     el "br" $ return ()
>     (visibleEvent, moveUpEvent, moveDownEvent)                                                      visibleEvent  <- (fmap $ const k)  button "visible"
>                                                     moveUpEvent   <- (fmap $ const k)  button "up"
>                                                     moveDownEvent <- (fmap $ const k)  button "down"
>                                                     return (visibleEvent, moveUpEvent, moveDownEvent)
>     elClass "p" "the image" $ elDynAttr "img" (fmap f x) (return ())
>     return $ ImageEvent visibleEvent moveUpEvent moveDownEvent tEvent
>   where
>     f :: Image -> Map T.Text T.Text
>     f i = M.fromList
>             [ ("src",   imageFileName i)
>             , ("width", "500")
>             ]
>     textBoxAttrs :: TextInputConfig t
>     textBoxAttrs = def { _textInputConfig_attributes = constDyn $ M.fromList [("size", "100")] }

To process the dynamic map we use listWithKey:

> listWithKey
>   :: forall t k v m a. (Ord k, MonadWidget t m)
>   => Dynamic t (Map k v)
>   -> (k -> Dynamic t v -> m a)
>   -> m (Dynamic t (Map k a))

Specialised to our usage, the type is:

> listWithKey
>   :: forall t m. (MonadWidget t m)
>   => Dynamic t (Map Int Image)
>   -> (Int -> Dynamic t Image -> m (ImageEvent t))
>   -> m (Dynamic t (Map Int (ImageEvent t)))

It’s like mapping over the elements of the dynamic input:

> listWithKey xs $ k x -> ...

We use elClass to produce the elements on the page. For example the text attribute showing if the image is visible or not can be rendered using dynText:

>     dynText $ fmap (T.pack . show . imageVisible) x

We have an fmap since x :: Dynamic t Image and Dynamic has a Functor instance.

The image list and all the events are wrapped up in imageListW. Here’s the main part:

> imageListW
>     :: forall t m. MonadWidget t m
>     => Dynamic t T.Text
>     -> m ()
> imageListW dynDrop = do
>     let eventDrop = fmap const $ updated $ fmap parseDrop dynDrop :: Event t (MM Image -> MM Image)
>     rec xs              [ eventDrop
>             , switch . current $ toggles
>             , switch . current $ ups
>             , switch . current $ downs
>             , switch . current $ keys
>             ]
>         bs 
>         let toggles :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             ups     :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             downs   :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             keys    :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             toggles = (mergeWith (.) . map (fmap $ toggleVisibility) . map evToggle . M.elems)  bs
>             ups     = (mergeWith (.) . map (fmap $ moveUp)           . map evUp     . M.elems)  bs
>             downs   = (mergeWith (.) . map (fmap $ moveDown)         . map evDown   . M.elems)  bs
>             keys    = (mergeWith (.) . map (fmap $ setDesc)          . map evKey    . M.elems)  bs
>     ta                          { _textAreaConfig_setValue   = (T.concat . map rawHTML . M.elems)  updated xs
>                         , _textAreaConfig_attributes = taAttrs
>                         }
>     return ()

Notice that toggles is used before it is defined! This is made possible by using the recursive do extension which provides the ability to do value recursion.

The key bit is the use of mergeWith that combines all of the events.

> mergeWith :: Reflex t => (a -> a -> a) -> [Event t a] -> Event t a

Here, mergeWith (.) will left-fold simultaneous events.

>     rec xs              [ eventDrop
>             , switch . current $ toggles
>             , switch . current $ ups
>             , switch . current $ downs
>             , switch . current $ keys
>             ]

The toggles has type

> Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))

so we use switch and current to get to an Event type:

ghci> :t switch
switch :: Reflex t => Behavior t (Event t a) -> Event t a

ghci> :t current
current :: Reflex t => Dynamic t a -> Behavior t a

ghci> :t switch . current
switch . current :: Reflex t => Dynamic t (Event t a) -> Event t a

This merge is also where we bring in the drag ‘n’ drop event via eventDrop which is how we get a list of images into the dynamic map.

Try it out

To try it out without setting up Reflex, grab gallery_editor.zip, unzip it, and open gallery_editor/gallery.jsexe/index.html in your browser. Drag some images onto the top area of the page using your file manager. Tested on Ubuntu 16.

Or, grab the source from Github.

Monadic Yesod form with ReCAPTCHA

Applicative forms in Yesod are nifty but they don’t let you customise layout, CSS, and so on. I had this form for comments on my blog:

commentFormOLD :: EntryId -> Form Comment
commentFormOLD entryId = renderDivs $ Comment
     pure entryId
     lift (liftIO getCurrentTime)
     areq textField (fieldSettingsLabel MsgCommentName) Nothing
     aopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
     aopt urlField (fieldSettingsLabel MsgCommentUrl) Nothing
     areq htmlField (fieldSettingsLabel MsgCommentText) Nothing
     pure False <* recaptchaAForm

I wanted to tweak the layout so I had to convert it to a monadic form. The only quirk was that the second part of the return value of recaptchaMForm is of type [FieldView site], not FieldView site. It looks like the first element of the list does the job for rendering the Yesod-ReCAPTCHA widget. So we set

    let recapView0 = recapView DL.!! 0

and then write


^{fvInput recapView0}

in the whamlet.

Here’s the full function. Note the fvId bits where we can specify the width and height. Also, to the form as a parameter to generateFormPost, we must have a parameter of type Html. So we put the EntryId at the front so that we can use Currying.

commentForm :: EntryId -> Html -> MForm Handler (FormResult Comment, Widget)
commentForm entryId extra = do
    (nameRes, nameView)     <- mreq textField  (fieldSettingsLabel MsgCommentName)  Nothing
    (emailRes, emailView)   <- mopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
    (urlRes, urlView)       <- mopt urlField   (fieldSettingsLabel MsgCommentUrl)   Nothing
    (textRes, textView)     <- mreq htmlField  (fieldSettingsLabel MsgCommentText)  Nothing

    (recapRes, recapView)   <- recaptchaMForm

    let recapView0 = recapView DL.!! 0

    now <- liftIO getCurrentTime

    let c = Comment
               pure entryId
               pure now
               pure False <* recapRes

    let widget = do
                    ##{fvId nameView} {
                        width: 70ch;
                    ##{fvId emailView} {
                        width: 70ch;
                    ##{fvId urlView} {
                        width: 70ch;
                    ##{fvId textView} {
                        width: 120ch;
                        height: 120ch;

Name #

^{fvInput nameView}

Email (not shown) #

^{fvInput emailView}

URL (optional) #

^{fvInput urlView}

Comment #

^{fvInput textView}

^{fvInput recapView0} |] return (c, widget)

The diff starting from line 71 shows the change in Handler/Home.hs: ce76215f72cbcf748bef89bfa3b09a077ceb9ab9#diff-1d7a37a0e3408faaa1abf31093eb8a50L71.

Structured logging

Up until now I have used basic log files of strings for applications that I have developed. But as the applications have become larger and been used by more people, dealing with huge verbose log files becomes a real time-waster.

In a blog post on structured logging, Gregory Szorc wrote about five major use cases of logging:

  1. Recording application errors (e.g. exceptions and stack traces).
  2. Low-level information to aid in debugging or human observation.
  3. Application monitoring (includes metrics and alerting).
  4. Business analytics (use the logged data to help make decisions).
  5. Security auditing.

For concreteness, consider a program that I developed for processes imaging datasets for cloud storage. It handles a pipeline for retrieving imaging datasets from MRI scanners, conversion to a format suitable for archival and further processing, and access control for staff in projects.

The five use cases correspond to these kinds of questions:

  1. Which users are encountering errors? (Some users quietly give up and don’t tell you about errors!) Does a new imaging instrument provide data that is not supported? Produce a report listing users by number of errors, or by instrument, or by …
  2. User X walks up to you and asks why they can’t see their dataset titled “something something something”. Find where their dataset is in the pipeline, and give a concrete estimate for when they will be able to see their data.
  3. How long does each dataset take to process? Where are the bottlenecks?
  4. How many datasets does department X process per week? What about department Y? Or user Z?
  5. When did user X get access to dataset Y?

Answering these questions is possible with logs containing plain strings, but it becomes difficult to run more complicated queries since regexes don’t scale very well.

The idea with structured logging is to log JSON blobs or key/value pairs instead of plain strings, so a structured log might look like this:

2016-04-24-1722 [patient_id=12345] [action=started_processing]

2016-04-24-1723 [patient_id=12345] [action=identified] [user=uqbobsmith] [dataset_id=10000]
                                   [project_id=999] [nr_files=20000] [patient_name="john something doe"]

2016-04-24-1724 [patient_id=12345] [action=processing_subset] [user=uqbobsmith] [dataset_id=10000]
                                   [project_id=999] [subset_id=3434]

2016-04-24-1734 [patient_id=12345] [action=uploaded_subset] [user=uqbobsmith] [dataset_id=10000] [project_id=999]
                                   [subset_id=3434] [remote_id=77a70620-3f22-47b5-8da2-ba2c2100f001] [time=600]

2016-04-24-1730 [patient_id=12345] [action=error] [user=uqbobsmith] [dataset_id=10000] [project_id=999]
                                   [subset_id=3435] [error_string="Did not find header file"]

2016-04-24-1742 [patient_id=12345] [action=finished] [user=uqbobsmith] [dataset_id=10000] [project_id=999] [time=1200]

2016-04-24-1750 [patient_id=12345] [action=acl] [acl_action=add] [user=uqbobsmith] [dataset_id=10000] [project_id=999]

Now the five questions can be framed in terms of queries over the structured data:

  1. Find all entries with “action=error” and report on the field “user=…”.
  2. Find the most recent log entries with user=uqbobsmith; filter by patient_name field; filter errors; or view “time” field to see how long each subset is taking to process.
  3. Plot the time-series of “time=…” values.
  4. Filter on “user=…” and “action=finished”; join with a table linking user IDs to departments; group by department names.
  5. Filter on “acl_action=add” and “user=uqbobsmith”; sort by timestamp.

As a proof of concept I knocked up django-struct-log. Since Postgresql has support for key-value pairs (including queries and indexes!) it is the logical choice for the storage of log items. The django-hstore package extends Django’s ORM model to handle Postgresql key/value pairs. And django-rest-framework provides a REST API for programs to post log entries.

The Django model for log items is in models.py:

class LogItem(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    # e.g. 'rdiff-backup'
    name = models.CharField(max_length=200)

    # e.g. 'my-server-1'
    host = models.CharField(max_length=200)

    # e.g. 'carlo'
    user = models.CharField(max_length=200)

    # Maybe something nice for a graph or email subject line.
    # e.g. 'daily rdiff-backup of something'
    description = models.TextField()

    # key/value blob
    attributes = HStoreField()

We don’t have to define any views, just a serializer and view-set for the model. This is in urls.py:

class LogItemSerializer(serializers.HyperlinkedModelSerializer):
    class Meta:
        model = LogItem
        fields = ('name', 'host', 'user', 'description', 'attributes', 'created_at', 'updated_at',)

class LogItemViewSet(viewsets.ModelViewSet):
    authentication_classes = (SessionAuthentication, BasicAuthentication, TokenAuthentication)
    permission_classes = (IsAuthenticated,)

    queryset = LogItem.objects.all()
    serializer_class = LogItemSerializer

router = routers.DefaultRouter()
router.register(r'logitems', LogItemViewSet)

Surprisingly, that’s pretty much it.

Using TokenAuthentication makes authorization easy (no username/passwords stored in plain text) and with the REST API we can post a log item using curl:

    -H "Content-Type: application/json"                                 
    -H "Authorization: Token 5a610d074e24692c9084e6c845da39acc0ee0002"  
    -X POST                                                             
    -d '{"name": "rdiff-backup", "host": "my-server-1", "description": "blah", "user": "carlo", "description": "daily rdiff-backup", "attributes": {"time_s": "1230"} }' 

The response should be something like:

 "description":"daily rdiff-backup",

This means that almost anything can send data to the log server – shell scripts, Python scripts, Haskell programs, anything.

Pulling out data for plotting is easy using Django’s ORM model. For example to get all the log items for “server1” with a “time_s” attribute:

data = LogItem.objects.filter(name='server1', attributes__has_key='time_s').all()
x = [z.created_at for z in data]
y = [float(z.attributes['time_s'])/60.0 for z in data] # minutes

Here is a sample script for a nightly report. It uses Matplotlib to draw a graph and the email library to format a HTML email.

import django
import os
import sys

from email.mime.text import MIMEText
from subprocess import Popen, PIPE
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.MIMEImage import MIMEImage

import matplotlib
matplotlib.use('Agg') # Force matplotlib to not use any Xwindows backend.
import matplotlib.pyplot as plt

import datetime as dt

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "djangostructlog.settings")
import django

from structlog.models import LogItem
import numpy as np

def send_email(subject, body, plot_png):
    # adapted from http://code.activestate.com/recipes/473810/

    strFrom = 'carlo@example.com'
    strTo   = 'carlo@example.com'

    msgRoot = MIMEMultipart('related')
    msgRoot['Subject'] = subject
    msgRoot['From']    = strFrom
    msgRoot['To']      = strTo
    msgRoot.preamble   = 'This is a multi-part message in MIME format.'

    # Encapsulate the plain and HTML versions of the message body in an
    # 'alternative' part, so message agents can decide which they want to display.
    msgAlternative = MIMEMultipart('alternative')

    msgText = MIMEText('This is the alternative plain text message.')

    # We reference the image in the IMG SRC attribute by the ID we give it below
    msgText = MIMEText(body + 'n
', 'html') msgAlternative.attach(msgText) # This example assumes the image is in the current directory fp = open(plot_png, 'rb') msgImage = MIMEImage(fp.read()) fp.close() # Define the image's ID as referenced above msgImage.add_header('Content-ID', '') msgRoot.attach(msgImage) p = Popen(["/usr/sbin/sendmail", "-t", "-oi"], stdin=PIPE) p.communicate(msgRoot.as_string()) def make_plot(n): data = LogItem.objects.filter(name=n, attributes__has_key='time_s').all() x = [z.created_at for z in data] y = [float(z.attributes['time_s'])/60.0 for z in data] for (a, b) in zip(x, y): print n, a, b fig, ax = plt.subplots() plt.plot_date(x, y) delta = dt.timedelta(days=2) ax.set_xlim(min(x).date() - delta, max(x).date() + delta) ax.set_ylim(0, max(y) + 20) ax.xaxis.set_ticks(x) start, end = ax.get_xlim() from matplotlib.dates import YEARLY, WEEKLY, DateFormatter, rrulewrapper, RRuleLocator, drange rule = rrulewrapper(WEEKLY) loc = RRuleLocator(rule) formatter = DateFormatter('%Y-%m-%d') ax.xaxis.set_major_locator(loc) ax.xaxis.set_major_formatter(formatter) from matplotlib.ticker import MultipleLocator, FormatStrFormatter from matplotlib.dates import DayLocator minorLocator = DayLocator() ax.xaxis.set_minor_locator(minorLocator) plt.xlabel('Date') plt.ylabel('Minutes') plt.savefig('foo.png') # FIXME use temp file instead! return 'foo.png' ######################################################################### rdiff_server1_png = make_plot('rdiffbackup-server1') send_email('rdiff-backup report - server1', '

Time to run rdiff-backup for server1

', rdiff_server1_png) rdiff_server2_png = make_plot('rdiffbackup-server2') send_email('rdiff-backup report - server2', '

Time to run rdiff-backup for server2

', rdiff_server2_png) print 'done'

Sample plot:

For exploring the log items you can poke around in the Django admin interface, or use the django-rest-framework’s endpoint:

Pointy edges



This note about zippers follows Backtracking Iterators (Jean-Christophe Filliâtre). The paper has examples in OCaml but they translate to Haskell fairly directly. Literate Haskell source for this post is here: https://github.com/carlohamalainen/playground/tree/master/haskell/zipper. To run this file, first install QuickCheck:

cabal update
cabal install QuickCheck
> module Zipper where
> import Debug.Trace
> import Test.QuickCheck
> import Test.QuickCheck.Gen

A tree datatype

For our examples, we use a simple algebraic datatype, a balanced binary tree with integer labels for the nodes:

> data Tree = Empty | Node Tree Int Tree
>           deriving (Eq, Show)

Here is an example tree:

> tree :: Tree
> tree = Node
>          (Node Empty 1 Empty)
>          2
>          (Node Empty 3 (Node Empty 4 Empty))

We would normally draw this tree like this, with E for Empty:

   1     3
  /    / 
 E   E E   4
         E   E

Think about traversing the tree. At the beginning there is no path – we are at the top of the tree. Otherwise we have gone down the left subtree or the right subtree.

If we went down the left branch at a node, we would have at hand the path that we followed to get to this node, the value at the node (an integer), and the tree on the right subtree that we did not visit.

Start at the top of the tree:

path: Top (haven't gone anywhere)

   1     3
  /    / 
 E   E E   4
         E   E

Now walk down the left branch.

path: went left, have a 2, and the subtree
      to the right of us is
                               E   4
                                 E   E

we are focused on this subtree:

 E   E

Encode this information in a type:

> data Path = Top                      -- ^ No path.
>           | WentLeft  Path Int Tree  -- ^ Followed the left subtree
>           | WentRight Tree Int Path  -- ^ Followed the right subtree
>           deriving (Eq, Show)

A zipper is a tree with a path.

> data Zipper = Zipper Tree Path
>             deriving (Eq, Show)

Working with zippers

The initial zipper is just the tree with no path.

> createZipper :: Tree -> Zipper
> createZipper t = Zipper t Top

Conversely, if we have a zipper and we are at the top, we can get the tree out of it.

> unZipper :: Zipper -> Tree
> unZipper (Zipper t Top) = t
> unZipper (Zipper t p)   = error $ "Can't unZipper here, path is " ++ show p ++ " with tree " ++ show t

Intuitively, we would expect that unZipper . createZipper = id, and we can check this using QuickCheck. First, provide an instance of Arbitrary for our binary trees:

> instance Arbitrary Tree where
>   arbitrary = frequency [ (1, return Empty) -- Empty
>                         , (1, arbNode)      -- Node   
>                         ]
>       where arbNode = do l <- arbitrary   -- 
>                          n <- arbitrary   -- 
>                          r <- arbitrary   -- 
>                          return $ Node l n r

Now the property unZipper . createZipper = id can be written as:

> prop_finish_createZipper t = unZipper (createZipper t) == t

Check it:

*Zipper> quickCheck prop_finish_create
+++ OK, passed 100 tests.

Looks good. Use verboseCheck prop_finish_create to see the values being generated.

Back to the zipper. Walking into the left subtree, as in the example above, involves moving the focus to the left subtree, and noting the node and the right subtree in the path component.

> goDownLeft :: Zipper -> Zipper
> goDownLeft (Zipper Empty        _) = error "Can't go down-left on an empty tree."
> goDownLeft (Zipper (Node l x r) p) = Zipper l (WentLeft p x r)

Going down the right subtree is similar:

> goDownRight :: Zipper -> Zipper
> goDownRight (Zipper Empty        _) = error "Can't go down-right on an empty tree."
> goDownRight (Zipper (Node l x r) p) = Zipper r (WentRight l x p)

Going up is the inverse of goDownLeft and goDownRight.

> goUp :: Zipper -> Zipper
> goUp (Zipper Empty Top)           = Zipper Empty Top
> goUp (Zipper l (WentLeft  p x r)) = Zipper (Node l x r) p
> goUp (Zipper r (WentRight l x p)) = Zipper (Node l x r) p

And we might want to go all the way up:

> unzipZipper :: Zipper -> Tree
> unzipZipper (Zipper t Top) = t
> unzipZipper z              = unzipZipper $ goUp z

Now we’d like to check with QuickCheck that going down an arbitrary path through a tree, then going all the way back up should bring us back to the same tree. So we will have to create random trees, paired with random paths through those trees. A tuple of type (Tree, Zipper) could work, but runs into dramas with overlapping instances since QuickCheck provides an instance for types, namely Arbitrary (a, b).

As a work-around, make a data type that holds a tree and a zipper:

> data TreeAndZipper = TreeAndZipper Tree Zipper
>   deriving (Eq, Show)

Here is the instance of Arbitrary:

> instance Arbitrary TreeAndZipper where
>   arbitrary = do t                   p                   return $ TreeAndZipper t p
>     where
>         arbPath z@(Zipper t p) = frequency [ (1, return z)    -- stop here
>                                            , (1, arbPath' z)  -- continue downwards
>                                            ]
>         arbPath' z@(Zipper Empty _) = return z
>         arbPath' z                  = frequency [ (1, arbPath $ goDownLeft  z)    -- go down left
>                                                 , (1, arbPath $ goDownRight z)    -- go down right
>                                                 , (1, return z)                   -- stop
>                                                 ]

Now with this instance we can encode the test that going down in a tree and then back up brings us back to the same tree.

> prop_zip_unzip :: TreeAndZipper -> Bool
> prop_zip_unzip (TreeAndZipper t z) = t == unzipZipper z

Check it:

*Zipper> quickCheck prop_zip_unzip
+++ OK, passed 100 tests.

Using verboseCheck we can see some of the values. Here is a sample:

(lots of output...)

TreeAndZipper (Node (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (-31) Empty) (Zipper (Node (Node (Node (Node (Node (Node Empty (-7) (Node (Node (Node (Node Empty 88 (Node Empty (-79) Empty)) 82 (Node (Node Empty (-20) Empty) (-15) (Node Empty (-94) Empty))) (-60) Empty) 55 (Node Empty 0 Empty))) 6 (Node Empty (-7) Empty)) (-18) (Node Empty (-80) (Node Empty 60 Empty))) (-35) (Node Empty (-73) Empty)) (-32) (Node (Node (Node (Node (Node Empty (-71) Empty) 30 (Node (Node Empty 0 Empty) (-68) (Node Empty 91 Empty))) 1 (Node Empty (-46) (Node Empty (-41) (Node (Node Empty 93 Empty) 79 (Node (Node Empty 48 (Node (Node Empty 46 Empty) 76 (Node (Node Empty (-57) (Node Empty 90 Empty)) 34 (Node Empty (-11) (Node Empty (-10) Empty))))) 55 (Node Empty 65 (Node (Node (Node (Node Empty 2 (Node Empty 11 (Node Empty 34 Empty))) (-69) Empty) 68 Empty) 49 (Node Empty (-67) (Node (Node Empty 73 (Node Empty 59 (Node (Node Empty (-28) Empty) (-22) Empty))) (-15) Empty))))))))) 39 (Node Empty 40 (Node (Node (Node (Node Empty 88 Empty) 60 Empty) (-87) Empty) 53 Empty))) (-43) (Node Empty (-16) Empty))) 54 (Node Empty 73 Empty)) (WentLeft Top (-31) Empty))
TreeAndZipper (Node Empty (-33) Empty) (Zipper (Node Empty (-33) Empty) Top)
TreeAndZipper Empty (Zipper Empty Top)
TreeAndZipper (Node Empty (-95) Empty) (Zipper (Node Empty (-95) Empty) Top)
+++ OK, passed 100 tests.

Traversals with a zipper

A nifty thing about zippers is that we can use them to step through a traversal, controlling the process programatically. If we are walking through a tree, we might be finished, or we have produced a value (an Int) but need to keep going through the zipper:

> data Step = Finished
>           | KeepGoing Int Zipper
>           deriving Show

The step function converts a zipper into this state (step) type:

> step :: Zipper -> Step

If we have an empty tree and no path, we are done.

> step (Zipper Empty Top) = Finished

If we have gone down-left, make note of the node’s value x and the rest of the zipper:

> step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)

Otherwise, we have a tree and a path, so try to continue by going down-left:

> step (Zipper t p) = step $ goDownLeft (Zipper t p)

In summary:

> step :: Zipper -> Step
> step (Zipper Empty Top)               = Finished
> step (Zipper Empty (WentLeft  p x r)) = KeepGoing x (Zipper r p)
> step (Zipper t p)                     = step $ goDownLeft (Zipper t p)

By repeatedly applying step we get an inorder traversal of the tree:

> inorder :: Tree -> [Int]
> inorder t = runStep (step (Zipper t Top)) []
>   where
>     runStep :: Step -> [Int] -> [Int]
>     runStep Finished                    acc = acc
>     runStep (KeepGoing x (Zipper t' p)) acc = runStep (step (Zipper t' p)) (acc ++ [x])

(As an aside, runStep is tail recursive.)

Using inorder on our example tree:

*Zipper> inorder tree

Here is a plain recursive definition of an inorder traversal:

> inorder' :: Tree -> [Int]
> inorder' Empty = []
> inorder' (Node l x r) = inorder' l ++ [x] ++ inorder' r

We can use this to verify that our fancy zipper inorder traversal is correct:

> prop_inorder :: Tree -> Bool
> prop_inorder t = inorder t == inorder' t

Testing it:

*Zipper> quickCheck prop_inorder
+++ OK, passed 100 tests.

If we want to do something different in the traversal, for example running a monadic action, we can use the same Step datatype and change the definition of runStep:

> inorderM :: Monad m => (Int -> m a) -> Tree -> m ()
> inorderM a t = runStepM a $ step (Zipper t Top)
>   where
>     runStepM :: Monad m => (Int -> m a) -> Step -> m ()
>     runStepM _ Finished                    = return ()
>     runStepM a (KeepGoing x (Zipper t' p)) = (a x) >> (runStepM a $ step (Zipper t' p))

Example usage:

*Zipper> inorderM (x -> putStrLn $ "Node value: " ++ show x) tree
Node value: 1
Node value: 2
Node value: 3
Node value: 4

Mapping over a tree

If we want to apply a function to each value in a tree, a recursive definition might be:

> mapTree :: (Int -> Int) -> Tree -> Tree
> mapTree _ Empty = Empty
> mapTree f (Node l x r) = Node (mapTree f l) (f x) (mapTree f r)
*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

We can check that mapTree id == mapTree:

> prop_maptree :: Tree -> Bool
> prop_maptree t = t == (mapTree id t)
*Zipper> quickCheck prop_maptree
+++ OK, passed 100 tests.

We can also use a zipper to map over the tree by using a different data type to represent the stepping:

> data MapStep = MapFinished
>              | MoreL Int Zipper
>              | More2 Zipper Int Zipper
>              deriving Show
> stepMap :: (Int -> Int) -> Zipper -> MapStep
> stepMap _ (Zipper Empty Top              ) = MapFinished
> stepMap f (Zipper Empty (WentLeft  p x r)) = MoreL (f x) (Zipper r p)
> stepMap f (Zipper (Node l x r) p)          = More2 (Zipper l p) (f x) (Zipper r p)
> mapTree' :: (Int -> Int) -> Tree -> Tree
> mapTree' f t = runStep (stepMap f $ Zipper t Top)
>   where
>     runStep :: MapStep -> Tree
>     runStep MapFinished     = Empty
>     runStep (MoreL x z)     = Node Empty x (runStep $ stepMap f z)
>     runStep (More2 zl x zr) = Node (runStep $ stepMap f zl) x (runStep $ stepMap f zr)

Testing it:

*Zipper> tree
Node (Node Empty 1 Empty) 2 (Node Empty 3 (Node Empty 4 Empty))

*Zipper> mapTree' (+1) tree
Node (Node Empty 2 Empty) 3 (Node Empty 4 (Node Empty 5 Empty))

And testing it using QuickCheck:

> prop_maptree' :: Tree -> Bool
> prop_maptree' t = (mapTree (+1) t) == (mapTree' (+1) t)
*Zipper> quickCheck prop_maptree'
+++ OK, passed 100 tests.

The Main.hs file runs all the tests:

$ ghc --make Main.hs
[1 of 2] Compiling Zipper           ( Zipper.lhs, Zipper.o )
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Linking Main ...

$ ./Main
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.

MINC interfaces in Nipype

About two years ago I wrote volgenmodel-nipype, a port of Andrew Janke’s volgenmodel to the Nipype workflow system. Nipype provides a framework for wrapping legacy command-line tools in a simple to use interface, which also plugs in to a workflow engine that can run jobs on a multicore PC, IPython parallel, SGE/PBS, etc.

Using a workflow that takes care of naming and tracking input/output files is very convenient. To blur an image (using mincblur) one can create a Node with the Blur interface, and then use .connect to send the output of some other node into this node:

blur = pe.Node(interface=Blur(fwhm=step_x*15),
                              name='blur_' + snum_txt)

workflow.connect(norm, 'output_threshold_mask', blur, 'input_file')

When I first developed volgenmodel-nipype I wrote my own Nipype interfaces for quite a few MINC tools. Over the 2015 Xmas holidays I got those interfaces merged into the master branch of Nipype.

I took this opportunity to tidy up volgenmodel-nipype. There are no locally defined MINC interfaces. I added a public dataset, available in a separate repository: https://github.com/carlohamalainen/volgenmodel-fast-example. Previously this data wasn’t publicly available. I also added some Docker scripts to run the whole workflow and compare the result against a known good model output, which I run in a weekly cronjob as a poor-person’s continuous integration test suite.

The mouse brain sample data produces a model that looks like this:

Note to self: profunctors

Note to self about deriving the Profunctor typeclass. Source is here: here.

This is a literate Haskell file, and it can be built using Stack:

git clone https://github.com/carlohamalainen/playground.git
cd playground/haskell/profunctors
stack build

Then use stack ghci instead of cabal repl. The main executable is in a path like ./.stack-work/install/x86_64-linux/lts-3.6/7.10.2/bin/profunctors-exe.

This blog post follows some of the examples from I love profunctors.

First, some extensions and imports:

> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleInstances     #-}
> {-# LANGUAGE InstanceSigs          #-}
> {-# LANGUAGE RankNTypes            #-}
> {-# LANGUAGE ScopedTypeVariables   #-}
> module Profunctors where
> import Control.Applicative
> import Data.Char
> import Data.Functor.Constant
> import Data.Functor.Identity
> import Data.Tuple (swap)
> import qualified Data.Map as M
> main = print "boo"

Motivating example

The basic problem here is to write a function that capitalizes each word in a string. First, write a function that capitalizes a single word:

> capWord :: String -> String
> capWord [] = []
> capWord (h:t) = (toUpper h):(map toLower t)

The straightforward solution (ignoring the loss of extra spaces between words since unwords . words is not an isomorphism) is to use this composition:

> capitalize :: String -> String
> capitalize = unwords . (map capWord) . words

Example output:

*Profunctors> capitalize "hey yo WHAT DID THIS          DO?"
"Hey Yo What Did This Do?"

Why stop here? Let’s generalise the capitalize function by factoring out the words and unwords functions. Call them w and u and make them arguments:

> capitalize1 :: (String -> [String]) -> ([String] -> String) -> String -> String
> capitalize1 w u = u . (map capWord) . w

Now, capitalize ≡ capitalize1 words unwords.

We may as well factor out map capWord as well:

> capitalize2 :: (String -> [String])
>              -> ([String] -> String)
>              -> ([String] -> [String])
>              -> String -> String
> capitalize2 w u f = u . f . w

We have: capitalize ≡ capitalize2 words unwords (map capWord).

Now look at the types – there is no reason to be restricted to String and [String] so use the most general types that make the composition u . f . w work:

     w          f          u
c -------> d -------> b -------> d

so w :: c -> d and similar for f and u. This lets us write

> capitalize3 :: (c -> a)
>             -> (b -> d)
>             -> (a -> b)
>             -> (c -> d)
> capitalize3 w u f = u . f . w

Next, we can generalize the type of f. To help with this step, recall that -> is a functor (there is an instance Functor (->)) so write the last two types in the signature with prefix notation:

> capitalize3' :: (c -> a)
>              -> (b -> d)
>              -> (->) a b
>              -> (->) c d
> capitalize3' w u f = u . f . w

Now we can use a general functor h instead of ->:

> capitalize4 :: (c -> a)
>             -> (b -> d)
>             -> h a b -- was (->) a b
>             -> h c d -- was (->) c d
> capitalize4 w u f = u . f . w

Naturally this won’t work because the type signature has the functor h but the body of capitalize4 is using function composition (the .) as the type error shows:

 | Couldn't match type ‘h’ with ‘(->)’
||   ‘h’ is a rigid type variable bound by
||       the type signature for
||         capitalize3' :: (c -> a) -> (b -> d) -> h a b -> h c d
|| Expected type: h c d
||   Actual type: c -> d

Fortunately for us, we can make a typeclass that captures the behaviour that we want. We have actually arrived at the definition of a profunctor.

> class Profunctor f where
>   dimap :: (c -> a) -> (b -> d) -> f a b -> f c d
> instance Profunctor (->) where
>   dimap :: (c -> a) -> (b -> d) -> (a -> b) -> c -> d
>   dimap h g k = g . k . h

Now we can write the capitalize function using a typeclass constraint on Profunctor which lets us use the dimap function instead of explicit function composition:

> capitalize5 :: String -> String
> capitalize5 s = dimap words unwords (map capWord) s

This is overkill for the capitalization problem, but it shows how structure can come out of simple problems if you keep hacking away.

MVar note

A short note about using MVars in Haskell. Source is here: https://github.com/carlohamalainen/playground/tree/master/haskell/mvar.

Unlike earlier blog posts, this one should be built using Stack. Something like:

git clone https://github.com/carlohamalainen/playground.git
cd playground/haskell/mvar
stack build

Then use stack ghci instead of cabal repl. The main executable is .stack-work/dist/x86_64-linux/Cabal-

> {-# LANGUAGE ScopedTypeVariables #-}
> module Main where
> import Control.Concurrent
> import Control.Monad
> import Control.Concurrent.ParallelIO.Local
> import System.IO

Here is the situation: we have a function that makes a call to some restricted resource, say a network API, and we would like calls to this API from our application to be serialized across multiple threads. For the purposes of this blog post, here is a dummy function that sleeps a bit and returns x + 1. Pretend that it’s calling a magical API on the network somewhere.

> getExpensiveThing :: Int -> IO Int
> getExpensiveThing x = do
>   threadDelay $ 1 * 10^6
>   return $ x + 1

We have a general task that makes use of the expensive resource:

> doThings :: Int -> IO ()
> doThings tid = do
>   x    putStrLn $ "doThings: thread " ++ show tid ++ " got " ++ show x

At the top level we need to run a few doThings in parallel:

> main0 :: IO ()
> main0 = do
>   hSetBuffering stdout LineBuffering -- Otherwise output is garbled.
>   let tasks = map doThings [1..5]
>   withPool 4 $ pool -> parallel_ pool tasks

The problem with main0 is that the calls to getExpensiveThing can happen simultaneously, so we need to use some kind of thread synchronisation primitive. I initially thought that I’d have to use a semaphore, a queue, or something fancy, but an MVar can do the trick.

We only need three operations on MVar:

Use newEmptyMVar to create a new MVar which is initially empty:

> newEmptyMVar :: IO (MVar a)

Use takeMVar to get the contents of the MVar. If the MVar is empty, takeMVar will wait until it is full.

> takeMVar :: MVar a -> IO a

Finally, use putMVar to put a value into an MVar. If the MVar is full, then putMVar will wait until the MVar is empty. If multiple threads are blocked on an MVar, they are woken up in FIFO order.

> putMVar :: MVar a -> a -> IO ()

So what we can do is have getExpensiveThing use takeMVar to block until a worker requires a value. The return value can be passed back via another MVar, which the worker is itself waiting on. The data type MVar is polymorphic in its type parameter, so there is no trouble in having an MVar of an MVar, or an MVar of a tuple containing another MVar, and so on. This is what we’ll use to represent a blocking action with input value of type a and output value of type b:

> data InOut a b = InOut (MVar (a, MVar b))

The outer MVar wraps a tuple, where the first component is the raw input value of type a, and the second component is the MVar in which the result value will be passed back. Here is the new getExpensiveThing:

> getExpensiveThing' :: InOut Int Int -> IO ()
> getExpensiveThing' (InOut io) = forever $ do
>   ((input :: Int), (output :: MVar Int))    threadDelay $ 1 * 10^6
>   putMVar output (input + 1)

The output MVar is contained inside the top level MVar. This way, getExpensiveThing’ has a unique channel back to the calling function. I used ScopedTypeVariables to be able to write the types of input and output inline, but this is just for clarity in this blog post. Also note that getExpensiveThing’ runs forever using forever from Control.Monad.

Here is the updated doThings that uses the MVar to communicate with getExpensiveThing’:

> doThings' :: InOut Int Int -> Int -> IO ()
> doThings' (InOut io) tid = do
>   result    putMVar io (tid, result)    -- Send our input (tid) and the result MVar.
>   x 
>   putStrLn $ "doThings': thread " ++ show tid ++ " got " ++ show x

Finally, main needs a top-level MVar which is the first parameter to doThings’ and a forked thread to run getExpensiveThing’:

> main :: IO ()
> main = do
>   hSetBuffering stdout LineBuffering -- Otherwise output is garbled.
>   topMVar 
>   _ 
>   let tasks = map (doThings' (InOut topMVar)) [1..5]
>   withPool 4 $ pool -> parallel_ pool tasks

Now each evaluation of threadDelay (the sensitive bit of code that represents a call to a resource) happens sequentially although the order is nondeterministic.

$ stack build && .stack-work/dist/x86_64-linux/Cabal-
doThings': thread 1 got 2
doThings': thread 5 got 6
doThings': thread 2 got 3
doThings': thread 4 got 5
doThings': thread 3 got 4

Just for fun, let’s make some helper functions to make calling a special worker via an MVar a bit cleaner. In general, calling a worker requires creating a results MVar, pushing the input and results MVar to the InOut MVar, and finally taking the result.

> callWorker :: InOut a b -> a -> IO b
> callWorker (InOut m) a = do
>     result      putMVar m (a, result)
>     takeMVar result

To save ourselves having to fork a worker, we can write a combinator that takes a worker and an action and runs the action with access to the newly created MVar:

> withWorker :: (InOut a b -> IO ()) -> (InOut a b -> IO c) -> IO c
> withWorker worker action = do
>   m    let io = InOut m
>   _    action io

Now doThings” is a bit shorter, at the expense of not knowing (at a glance) what the io thing is going to do.

> doThings'' :: InOut Int Int -> Int -> IO ()
> doThings'' io tid = do
>   x 
>   putStrLn $ "doThings'': thread " ++ show tid ++ " got " ++ show x

Finally, main’ is largely unchanged except for withWorker at the top level.

> main' :: IO ()
> main' = withWorker getExpensiveThing' $ io -> do
>   hSetBuffering stdout LineBuffering -- Otherwise output is garbled.
>   let tasks = map (doThings'' io) [1..5]
>   withPool 4 $ pool -> parallel_ pool tasks

Running main’:

*Main> main'
doThings'': thread 2 got 3
doThings'': thread 3 got 4
doThings'': thread 4 got 5
doThings'': thread 1 got 2
doThings'': thread 5 got 6

Yesod 1.1 to 1.4 notes

This blog runs on a barebones blogging framework that I knocked together using Yesod 1.1 back in 2013. I recently ported it over to Yesod 1.4. Apart from the few changes that I have detailed below, everything worked straight away. Refactoring code in Haskell is a very different experience compared to fully dynamic languages.

Here are some notes on the changes that I encountered between Yesod 1.1 and 1.4. Perhaps these will be useful for someone.


Previously I used aformM to get the current time in a form:

commentForm :: EntryId -> Form Comment
commentForm entryId = renderDivs $ Comment
     pure entryId
     aformM (liftIO getCurrentTime)
     areq textField (fieldSettingsLabel MsgCommentName) Nothing
     aopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
     aopt urlField (fieldSettingsLabel MsgCommentUrl) Nothing
     areq htmlField (fieldSettingsLabel MsgCommentText) Nothing
     pure False <* recaptchaAForm

Now, use lift (liftIO getCurrentTime):

commentForm :: EntryId -> Form Comment
commentForm entryId = renderDivs $ Comment
     pure entryId
    --  aformM (liftIO getCurrentTime)
     lift (liftIO getCurrentTime)
     areq textField (fieldSettingsLabel MsgCommentName) Nothing
     aopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
     aopt urlField (fieldSettingsLabel MsgCommentUrl) Nothing
     areq htmlField (fieldSettingsLabel MsgCommentText) Nothing
     pure False <* recaptchaAForm


Some new names clash with the Prelude, e.g. maximum is not the usual function from the Prelude, but rather something from Data.MinLen that encodes type-level natural numbers.

*Main> :t maximum
maximum :: MonoFoldableOrd mono => MinLen (Succ nat) mono -> Element mono

*Main> :t P.maximum
P.maximum :: Ord a => [a] -> a

No unKey or PersistInt64

Persistent values in the old system looked like this:

Entity {entityKey = Key {unKey = PersistInt64 1},
        entityVal = title: "first post" mashed title: "first-post" year: 2015 month: 8 day: 14 content: "Hi there!" visible: False}

Entity {entityKey = Key {unKey = PersistInt64 2},
        entityVal = title: "second post" mashed title: "second-post" year: 2015 month: 8 day: 14 content: "Hi there! Do de dah!" visible: False}

and we could use PersistInt64 to construct the value, or unKey to deconstruct it.

*Main> :t PersistInt64
PersistInt64 :: GHC.Int.Int64 -> PersistValue

*Main> :t unKey
unKey :: KeyBackend backend entity -> PersistValue

Now values look like:

Entity {entityKey = EntryKey {unEntryKey = SqlBackendKey {unSqlBackendKey = 1}}, entityVal = "first post"}
Entity {entityKey = EntryKey {unEntryKey = SqlBackendKey {unSqlBackendKey = 2}}, entityVal = "second post"}

Old code like

foo :: PersistValue -> GHC.Int.Int64
foo (PersistInt64 i) = i

niceEntryId :: KeyBackend backend entity -> String
niceEntryId x = show $ foo $ unKey x


niceEntryId :: Key Entry -> Text
niceEntryId = DT.pack . show . unSqlBackendKey . unEntryKey

We could also use toPathPiece:

*Main> :t toPathPiece :: Key Entry -> Text
toPathPiece :: Key Entry -> Text :: Key Entry -> Text

If you’re wondering how to find such a thing, look at the output of :info Key in ghci which
includes these lines:

instance PathPiece (Key User) -- Defined at Model.hs:10:1
instance PathPiece (Key Entry) -- Defined at Model.hs:10:1
instance PathPiece (Key Comment) -- Defined at Model.hs:10:1

I believe that the more general option is fromSqlKey:

unKey' :: ToBackendKey SqlBackend record => Key record -> Text
unKey' = DT.pack . show . fromSqlKey

This should work over any SQL backend, unlike the
older code that was tied to the particular implementation (e.g. 64bit

Similarly, old code constructed a Key from an integer:

let entryId = Key $ PersistInt64 (fromIntegral i)

New code uses toSqlKey since the PersistInt64 constructor isn’t available:

let entryId = toSqlKey i :: Key Entry


Original blog framework, written with Yesod https://github.com/carlohamalainen/cli-yesod-blog.

New blog framework, compiles against Yesod 1.4: https://github.com/carlohamalainen/cli-yesod-blog-1.4.