Teaching, Training, and Transforming: Building Capacity with Nextflow in Vermont

This post has been written by our valued community members.

In our work within a busy core facility as well as for an infectious diseases group at the University of Vermont, we have to collaborate with many investigators and students. I discovered Nextflow and started using it intensively last year and haven't looked back. Currently I use Nextflow and nf-core for most of my projects. Here I want to discuss how I have used it for collaborations.

Developing Custom Workflows

The first level of collaboration consists in a close collaboration for a custom workflow. Together with the investigator we obtain clarity on what is needed and I write an entire data pipeline. For example, for one of the investigators we built a pipeline to:

Perform basic pre-processing of long read fastq viral sequences: trimming barcodes, trimming sequences, QC, etc.
Run two parallel analyses:
- 1. Generate a consensus sequence and use blast to determine the specific virus that it came from;
- 2. Use blast on each individual read to determine the specific virus for the read. We use blast against a custom database of viral sequences.

Each run would take about 20-30 hours and over the span of many months, we processed many terabytes of sequencing data. After each run we would generate a report that summarized the results, which compared the two types of analyses (with almost 100% agreement), and generated graphs for visualization, etc.

As the results came out, in collaboration with the investigator, we had to make minor modifications to the pipeline, which at some point resulted in running the pipeline through all the TBs of data again! Thanks to Nextflow, this could be started with a few keystrokes.

Empowering Collaborators to Run Pipelines Independently

Often, collaboration involves working with one of the students or key staff of the principal investigator. Together we decide on the pipeline, use existing ones if available and expand if necessary. Then the collaborator can run it on their own data and provide regular feedback. This has led to some wonderful collaborations, for example, in one collaboration we used the UPHL-BioNGS/walkercreek pipeline to subtype viral strains and subsequently ran a custom pipeline to do alignments and create a phylogenetic tree. The collaborator was then able to run the pipeline themselves as new strains were sequenced.

Eventually we might be able to teach workshops and provide training so that students can build essential skills needed to successfully modify an existing pipeline.

Training Sequencing Facility Technicians and Bioinformatics Core Personnel

At the University of Vermont Bioinformatics Core, we work very closely with the Sequencing Facility. They perform the sequencing using various instruments, but there is further processing that is needed (e.g. demultiplexing, concatenating files, basic QC). We created a pre-processing pipeline that did part of the preprocessing (we will expand this in the following months), and then put this in Seqera Platform so that the technicians could run it right after the sequencing finished.

In order for technicians to use it, as well as to educate other colleagues in the Bioinformatics Core about it, we ran a small workshop. Below is a picture of a recent hands-on Nextflow and Seqera Platform tutorial, where people installed Nextflow and ran their first pipeline. We also introduced the above Seqera Platform pipeline and the technicians successfully used it for the first time on some recent data. This pipeline is now used routinely by the sequencing facility for the pre-processing of the data and to serve the many investigators that use their services. Right now it is very basic, but we plan to expand it within the next few months.

Figure 1. Nextflow/Seqera Platform tutorial on Nov 20, 2024: Introduction to running Nextflow pipelines and the Seqera Platform.

Notes on the Journey

2024 was the “Nextflow year” for me, with the Nextflow Summit and nf-coreHackathon in Boston and Barcelona, and also when I contributed my first module to nf-core, developed various pipelines, and now I’m on the way to contribute my first subworkflow. I really want to thank the team at Seqera and the many wonderful Nextflow and nf-core community members for all their help and support. Thank you as well to my team at the Bioinformatics Core, the Sequencing Facility and the Translational Global Infectious Disease Research Center for all their support.

This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it here.