One more step towards Nextflow modules
The ability to create components, libraries or module files has been among the most requested feature ever over the years.
For this reason, today we are very happy to announce that a preview implementation of the modules feature has been merged on master branch of the project and included in the 19.05.0-edge release.
The implementation of this feature has opened the possibility for many fantastic improvements to Nextflow and its syntax. We are extremely excited as it results in a radical new way of writing Nextflow applications! So much so, that we are referring to these changes as DSL 2.
Enabling DSL 2 syntax
Since this is still a preview technology and, above all, to not break any existing applications, to enable the new syntax you will need to add the following line at the beginning of your workflow script:
Module files
A module file simply consists of one or more process
definitions, written with the usual syntax. The only difference is that the from
and into
clauses in the input:
and output:
definition blocks has to be omitted. For example:
The above snippet defines a process component that can be imported in the main application script using the include
statement shown below.
Also, module files can declare optional parameters using the usual params
idiom, as it can be done in any standard script file.
This approach, which is consistent with the current Nextflow syntax, makes very easy to migrate existing code to the new modules system, reducing it to a mere copy & pasting exercise in most cases.
You can see a complete module file here.
Module inclusion
A module file can be included into a Nextflow script using the include
statement. With this it becomes possible to reference any process defined in the module using the usual syntax for a function invocation, and specifying the expected input channels as they were function arguments.
Notably, each process defines its own namespace in the script scope which allows the access of the process output channel(s) using the .out
attribute. This can be used then as any other Nextflow channel variable in your pipeline script.
The include
statement gives also the possibility to include only a specific process or to include a process with a different name alias.
Smart channel forking
One of the most important changes of the new syntax is that any channel can be read as many times as you need removing the requirement to duplicate them using the into
operator.
For example, in the above snippet, the read_pairs_ch
channel has been used twice, as input both for the FASTQC
and QUANT
processes. Nextflow forks it behind the scene for you.
This makes the writing of workflow scripts much more fluent, readable and ... fun! No more channel names proliferation!
Nextflow pipes!
Finally, maybe our favourite one. The new DSL introduces the |
(pipe) operator which allows for the composition of Nextflow channels, processes and operators together seamlessly in a much more expressive way.
Consider the following example:
In the last line, the fromPath
channel is piped to the `splitFasta` operator whose result is used as input by the align
process. Then the output is finally printed by the view
operator.
This syntax finally realizes the Nextflow vision of empowering developers to write complex data analysis applications with a simple but powerful language that mimics the expressiveness of the Unix pipe model but at the same time makes it possible to handle complex data structures and patterns as is required for highly parallelised and distributed computational workflows.
Conclusion
This wave of improvements brings a radically new experience when it comes to writing Nextflow workflows. We are releasing it as a preview technology to allow users to try, test, provide their feedback and give us the possibility stabilise it.
We are also working to other important enhancements that will be included soon, such as remote modules, sub-workflows composition, simplified file path wrangling and more. Stay tuned!