Charalampos Lazaris, Ph.D.Charalampos Lazaris, Ph.D.
Nov 13, 2024

Nextflow and Seqera containers: a lifesaver for bioinformatics work on HPC environments

This post has been written by our valued community members.

In the rapidly evolving fields of bioinformatics and data science, finding computational solutions to efficiently manage large datasets and handle dependencies of computational packages is increasingly crucial. Nextflow is a domain-specific language and runtime system designed to build and execute computational workflows efficiently and reproducibly across a wide range of environments, from local machines to cloud platforms. In this post, I share how Nextflow and its seamless support for Seqera Containers have been game-changers for my work as a Data Scientist in the Oncology Data Science group at Novartis in Cambridge, MA.

Reproducible workflows on HPC using Nextflow

Most of my computational analyses and workflows run on on-premises High-Performance Computing (HPC) systems. These HPC systems offer several advantages over cloud-based solutions but come with limitations, particularly in a large organizational setting. For example, when new bioinformatics packages are released, they must be made available as "modules" on the HPC. Whether these packages are deployed as modules depends on their anticipated usage, and this process often involves considerable delays. This lack of flexibility poses challenges for computational biologists who need to experiment with the latest tools. Moreover, even when modules are available, compatibility issues among different tool versions can lead to non-reproducible and non-future-proof scripts.

This is where Nextflow has been invaluable to my work. It is an expressive language that enables rapid task implementation and testing. Nextflow’s active and supportive community, comprehensive documentation, and robust integration of reproducible computational environments, such as Conda environments or containers, have made my work significantly more efficient. With Nextflow, I no longer have to wait for specific tools to be made available as HPC modules. Instead, I can create a Conda environment with the required tools and incorporate it into my Nextflow script with a single line of code or build a container with these tools and use it directly.

Seqera Containers: Build containers on the fly

A remarkable new feature of Nextflow is its integration with Seqera Containers. Containers offer a higher level of reproducibility compared to Conda environments, but building and using them can be complex, involving steps like writing Dockerfiles, converting Docker images to Singularity images for secure HPC use, and more. For those who are not deeply familiar with containerization technologies, this complexity can be overwhelming. Using Seqera Containers simplifies this process by providing a plug-and-play solution. Users can select one or more packages from Conda and PyPI and build either a Docker or Singularity container tailored to their computer architecture. If the container has already been pre-built, the URL becomes available immediately and can be integrated into a Nextflow script with a single line of code. If the container is not yet built, the URL to the future container is still available, though it may take some time for the container to be generated. Thanks to a partnership between Seqera and Amazon Web Services (AWS), containers created with Seqera Containers are hosted for free on AWS for at least five years, with no performance trade-offs because of the guaranteed high pull rates.

Overall, Nextflow and Seqera Containers are an extremely powerful combination, providing immense value for those of us managing large datasets and intricate computational workflows. Looking ahead, I hope to see more package sources and support for more complex container configurations. Additionally, a feature to "push" newly created containers to Docker Hub would be valuable. While many of these capabilities are available through Wave—an open-source tool that natively integrates with Nextflow—streamlining these features within the Seqera Containers interface would be a significant benefit to the community.

This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it here.