From Complete Beginner to nf-core Contributor

This post has been written by our valued community members.

Introduction

As part of my training in the scientist training programme, I undertook a Master’s project to develop, validate, and implement a long-read bioinformatics pipeline for rare disease diagnostics using Nextflow. Alongside this, I was also keen to undertake an elective. The purpose of an elective is to provide experience beyond routine diagnostic service work. I wanted to gain hands-on experience of how robust, reproducible bioinformatics pipelines are designed, reviewed, and maintained in accordance with nf-core best practices.

Hence, as part of my elective, my manager introduced me to Geraldine Van de Auwera, Lead Developer Advocate at Seqera, the creators of Nextflow. I was then connected with my mentor, Friederike Hanssen (aka Rike), Bioinformatics Engineer at Seqera and member of the core team of nf-core. The aim was to develop a robust, standards-compliant workflow that could be implemented in rare disease diagnostics and directly support our department. I was excited, but also overwhelmed, as I had no prior experience writing nf-core workflows and only a conceptual understanding of Nextflow. The gap between aspiration and ability felt wide. What bridged it was patient mentorship and a commitment to doing things the right way from the very beginning. Over months of collaboration, I moved from a complete beginner to a contributor whose pipeline has now been accepted by nf-core. This blog tells that story, covering what I learned about Nextflow, nf-core, testing, documentation, community practice, and why rigor matters for clinical work.

Shifting My Mindset: Thinking in Nextflow Dataflow

The first hurdle was mindset. I had to stop thinking in terms of a script that runs tools and start thinking in terms of a composable dataflow described in Nextflow. Rike encouraged me to separate concerns:

→Modules would wrap individual tools
→Subworkflows would compose modules into coherent tasks
→Main workflow would coordinate everything

Underpinning this modularity was the nf-core meta map, a stable metadata object passed alongside files through channels. Because I was working with long-read data, I focused on a long-read-friendly samplesheet and meta design, capturing a sample identifier, platform and run metadata, and pointers to input files. At first, formalizing that structure felt difficult. Then I saw how it prevented channel confusion, made interfaces self-documenting, and allowed me to swap modules without re-writing downstream processes.

Building Modules: From Local to Upstream

I started by implementing wrappers as local modules within the pipeline repository. Every tool I added followed the same pattern: declare clear inputs and outputs, carry the meta map through faithfully, pin containerized environments rather than relying on developer machines, and expose only the parameters that users genuinely need. Rike taught me to resist the temptation to make everything configurable. Carefully chosen defaults are kinder to users and to future maintainers. Once a local module was stable, Rike encouraged me to upstream it to nf-core/modules. I am glad she did. Contributing modules upstream accelerates review, shares maintenance, and multiplies value across the community. It also raises the bar, with tests, documentation, and linting required, and those investments pay off later when you are chasing a stubborn bug or integrating a new feature.

Testing Early and Often

I also learned to write tests early, and Rike shared concrete examples and resources, including patterns from the nf-core/sarek workflow, which helped me get started. I wrote nf-test tests for each module using deterministic, minimal input data so they remained fast and focused. The aim was not to recreate biological complexity but to confirm that each wrapper produced the promised files with the expected structure. As I composed modules into subworkflows, I added tests to check the wiring, including tuple ordering, channel cardinality, and the presence of expected outputs. I also created a test profile configuration that ran quickly. This layered approach fed into continuous integration on GitHub Actions, where each push and pull request triggered linting, schema validation, module tests, subworkflow tests and the pipeline test. The nf-core linting tools enforced naming conventions, resource labels, and schema coherence.

Leveraging the nf-core Ecosystem

I learned how datasets can be implemented within nf-core test-datasets rather than being embedded directly in a workflow. Furthermore, I was encouraged to reference relevant databases hosted on AWS through nf-core where appropriate, which keeps repositories lean while ensuring tests and examples remain reproducible.

Another great piece of the Nextflow ecosystem is the Wave container approach, which makes it easier to combine different containers when needed. This was particularly useful when a module needed both bgzip and indexing in the same step. Having that flexibility made the work smoother and more reliable. Rike outlined how tool versioning works in nf-core and how versioned outputs can be produced in a pipeline, which is important for traceability and for maintaining consistent behaviour over time.

Organizing with Subworkflows

From the beginning, Rike encouraged me to start with subworkflows. As the pipeline broadened, subworkflows became the organizing principle, which was essential for keeping the code understandable. Early on I even used subworkflows that wrapped single modules because it made the structure clearer while I was learning. Later, as my understanding grew and the design stabilized, I was able to remove those one-module subworkflows and keep only the compositions that added real value. This evolution kept the repository tidy without losing the clarity that the structure provided at the start.

Working in the Open

As the pipeline developed, it became clear I should make a proposal early for it to become an nf-core pipeline. Working in the open helped me to become a better collaborator. I learned to keep pull requests small and purposeful and to describe the intent behind each change so reviewers could follow the reasoning.

Rike’s guidance helped me to write faster and more confidently and ensured I learned nf-core practices early on. The impact of leaning into nf-core best practices has been tangible.

→Reviews moved quickly because the layout, tests, and schemas were familiar to the community.
→ Maintenance is lighter because template updates and upstream modules keep us aligned with the wider ecosystem.
→ Onboarding new collaborators is simpler when they can read the schema, skim the documentation, run the test profile, and contribute with confidence.
→ Above all, clinical confidence rises when environments are pinned, provenance is explicit, and behaviour is predictable over time.

Acknowledgements and Next Steps

This pipeline began as a stretch goal for my elective and became a reality through mentorship. I am deeply grateful to Rike for her clarity and for teaching me best practices, to my manager for making the introduction and championing the work, and to the nf-core community for their reviews, tools, and the culture that makes quality the default. Acceptance into nf-core is not an endpoint so much as a foundation. The next steps are to iterate in the open, expand test coverage where it is thin, contribute more modules upstream, refine configuration profiles for different clinical contexts, and evaluate emerging long-read methods as they stabilize. Because the pipeline stands on nf-core foundations, these evolutions can be incremental, testable, and reviewable, which is exactly how scientific software should grow.

Advice for Others

For anyone embarking on a similar journey, my experience offers a simple reassurance. You can begin with no nf-core experience and a challenging clinical goal, and still deliver something that scales to real use, and contributes back to the community. The path is not glamorous. It is paved with template syncs, lint errors, test fixtures, schema tweaks, and patient reviews. Yet that path leads somewhere solid. With the right guidance and a commitment to best practices, you do not just build a pipeline. You build the habits and instincts that turn good intentions into reliable software. That is what mentorship unlocked for me, and it is why I believe so strongly in the standards and the community that shaped this work.

This post was contributed by a Nextflow Ambassador. Ambassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador? Read more about it here.

New to Nextflow?Nextflow is the leading open-source workflow orchestrator for scalable, reproducible, and portable scientific data analysis. It simplifies the writing and deployment of complex pipelines on any infrastructure.

Learn more