Demystifying Nextflow resume

This two-part blog aims to help users understand Nextflow’s powerful caching mechanism. Part one describes how it works whilst part two will focus on execution provenance and troubleshooting. You can read part two here.

Task execution caching and checkpointing is an essential feature of any modern workflow manager and Nextflow provides an automated caching mechanism with every workflow execution. When using the -resume flag, successfully completed tasks are skipped and the previously cached results are used in downstream tasks. But understanding the specifics of how it works and debugging situations when the behavior is not as expected is a common source of frustration.

The mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, called the working directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number obtained from a composition of the task’s:

Inputs values
Input files
Command line string
Container ID
Conda environment
Environment modules
Any executed scripts in the bin directory

How does resume work?

The -resume command line option allows for the continuation of a workflow execution. It can be used in its most basic form with:

$ nextflow run nextflow-io/hello -resume

In practice, every execution starts from the beginning. However, when using resume, before launching a task, Nextflow uses the unique ID to check if:

The working directory exists
It contains a valid command exit status
It contains the expected output files

If these conditions are satisfied, the task execution is skipped and the previously computed outputs are applied. When a task requires recomputation, ie. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.

The working directory

By default, the task work directories are created in the directory from where the pipeline is launched. This is often a scratch storage area that can be cleaned up once the computation is completed. A different location for the execution work directory can be specified using the command line option -w. For example:

$ nextflow run <script> -w /some/scratch/dir

Note that if you delete or move the pipeline work directory, this will prevent to use the resume feature in subsequent runs.

Also note that the pipeline work directory is intended to be used as a temporary scratch area. The final workflow outputs are expected to be stored in a different location specified using the `publishDir` directive.

How is the hash calculated on input files?

The hash provides a convenient way for Nextflow to determine if a task requires recomputation. For each input file, the hash code is computed with:

The complete file path
The file size
The last modified timestamp

Therefore, even just performing a touch on a file will invalidate the task execution.

How to ensure resume works as expected?

It is good practice to organize each experiment in its own folder. An experiment’s input parameters can be specified using a Nextflow config file which also makes it simple to track and replicate an experiment over time. Note that you should avoid launching two (or more) Nextflow instances in the same directory concurrently.

The nextflow log command lists the executions run in the current folder:

$ nextflow log

TIMESTAMP            DURATION  RUN NAME          STATUS  REVISION ID  SESSION ID                            COMMAND
2019-05-06 12:07:32  1.2s      focused_carson    ERR     a9012339ce   7363b3f0-09ac-495b-a947-28cf430d0b85  nextflow run hello
2019-05-06 12:08:33  21.1s     mighty_boyd       OK      a9012339ce   7363b3f0-09ac-495b-a947-28cf430d0b85  nextflow run rnaseq-nf -with-docker
2019-05-06 12:31:15  1.2s      insane_celsius    ERR     b9aefc67b4   4dc656d2-c410-44c8-bc32-7dd0ea87bebf  nextflow run rnaseq-nf
2019-05-06 12:31:24  17s       stupefied_euclid  OK      b9aefc67b4   4dc656d2-c410-44c8-bc32-7dd0ea87bebf  nextflow run rnaseq-nf -resume -with-docker

You can use the resume command with the session ID to recover a specific execution. For example:

nextflow run naseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf

Stay tuned for part two where we will discuss resume in more detail with respect to provenance and troubleshooting techniques!