Here is my latest project: https://github.com/carlohamalainen/volgenmodel-nipype. It is a port of the Perl script volgenmodel to Python, using the functionality of Nipype.

A lot of scientific workflow code has a common pattern, something like this: collect some input files, run something to produce intermediate results, and then combine the results into a final result. One way to implement the workflow is to glob the files and set up arrays or dictionaries to keep track of the outputs.

files = glob.glob('/tmp/blah*.dat')

intermediate_result = [None] * len(files)

for (i, f) in enumerate(files):
    intermediate_result[i] = fn1(f, param=0.3)

final_result = fn2(intermediate_result)

The problem with this approach is that it doesn’t scale well nor is it easy to reason about. The equivalent in Nipype is:

import nipype.pipeline.engine as pe
import nipype.interfaces.io as nio

datasource = pe.Node(interface=nio.DataGrabber(sort_filelist=True), name='datasource_dat')
datasource.inputs.base_directory = '/scratch/data'
datasource.inputs.template = 'blah*.dat'

datasink = pe.Node(interface=nio.DataSink(), name="datasink")
datasink.inputs.base_directory = '/scratch/output'

intermediate = pe.MapNode(
                    interface=fn1_interface(param=0.3)
                    name='intermediate_mapnode',
                    iterfield=['input_file'])

final = pe.Node(
            interface=fn2,
            name='final_node')

workflow = pe.Workflow(name="workflow")

# Apply the fn1 interface to each file in the datasource:
workflow.connect(datasource, 'outfiles', intermediate, 'input_file')

# Apply the fn2 interface to the list of outputs from the intermediate map node:
workflow.connect(intermediate, 'output_file', final, 'input_file')

# Save the final output:
workflow.connect(final, 'output_file', datasink, 'final')

This code is much closer to the actual problem that we are trying to solve, and as a bonus we don’t have to take care of arrays of input and output files, which is pure agony and prone to errors.

Nipype lets us run the workflow using a single core like this:

workflow.run()

or we can fire it up using 4 cores using:

workflow.run(plugin='MultiProc', plugin_args={'n_procs' : 4})

Nipype also has plugins for SGE, PBS, HTCondor, LSF, SLURM, and others.

Here is volgenmodel-nipype’s workflow graph (generating this graph is a one-liner with the workflow object). Click the image for the full size version.