Seqera

Pipeline chaining, meta pipelines and automation (Part 1)

In Episode 56 of the Nextflow podcast, Phil Ewels is joined by Ben Sherman and Edmund Miller to explore the topic of pipeline chaining and meta pipelines. Pipeline chaining is often a natural evolution when running many Nextflow pipelines and wanting to automate how they launch and connect to one another.

This episode has been split into two parts. In Part 1, the trio examines all the different current and existing solutions that people have used to automate Nextflow, chain pipelines, and build meta pipelines.

Key Links

Meta Pipelines & Pipeline Chaining:

nf-core #wg-meta-pipelines Slack channel
nf-core/sarek - Example of a big, almost-meta pipeline
nf-cascade - Tool for treating Nextflow pipelines as processes
- nf-cascade GitHub repo
- Blog post about nf-cascade

Automation Tools:

Graham’s AWS Automation Blog Post
Seqera Node-RED - Low-code automation platform
Other automation platforms
- Dagster - Data orchestration platform
- Temporal - Durable execution framework
- Temporal / Seqera demo
- n8n - Workflow automation tool
- Snakemake - Python-based workflow manager
Seqera Platform
Seqera Platform API

Summary

Pipeline chaining and meta pipelines represent a natural evolution in workflow automation. This topic has been around for years—dating back to when DSL2 was first announced at the Barcelona Nextflow meeting in 2019. Back then, the nf-core community was optimistic that the new workflow block syntax would unlock the ability to stitch workflows together seamlessly. But seven years later, we’re still working to make this a reality.

Meta Pipelines vs. Pipeline Chaining

Ben Sherman kicks things off by defining the two main paradigms:

Meta pipelines involve describing everything in one big Nextflow pipeline. You import existing pipelines (like nf-core/rnaseq) as sub-workflows in a larger pipeline. This gives you a single unified DAG, full resume-ability, and parallel execution across all processes.

Pipeline chaining uses external tools to orchestrate multiple separate Nextflow pipeline runs. When Pipeline A finishes, certain outputs feed into Pipeline B’s inputs. The benefit? You don’t have to babysit every step—just launch it and let the chain run.

Why Hasn’t This Happened Yet?

Despite the optimism around DSL2, importing pipelines as workflows hasn’t become widespread. Phil points out that many nf-core pipelines still have a named workflow wrapper (like rnaseq) with no obvious purpose; a historical artifact from when everyone thought meta pipelines would be easy.

So what makes it difficult?

Overlapping config scopes: Every nf-core pipeline uses the same command-line flags (like params.input). When you import three pipelines, you get namespace collisions and scoping nightmares.

Config replication: Pulling in just a workflow means you also need tons of configuration—all the ext.args settings, command-line flags, and more—which must be manually copied and scoped.

We discuss the tooling gap: there’s no easy way to install a core workflow from a pipeline without going through a monorepo like nf-core/modules. The nf-core CLI can install modules and sub-workflows, but those have to be stored centrally. What if you could do nf-core pipeline install rnaseq and pull the workflow directly from the pipeline repo?

Current Solutions: nf-cascade

One creative solution is nf-cascade, developed by Mahesh Binzer-Panchal. This tool treats a Nextflow pipeline as a command-line tool you run in a process. You create a process called nextflow-run and supply inputs like the pipeline name and parameters. You can even specify pipeline chains in a YAML file.

It’s “Nextflow in Nextflow” - a quick-and-dirty approach that works. But there are rough edges: nested work directories, separate resume histories, and potential naming conflicts. Still, it’s a pragmatic way to make things work when you’re willing to treat inner pipeline runs as black boxes.

Nextflow in Snakemake

Edmund Miller shares a dark secret: before DSL2, he was a Snakemake user. And when wrapping up his dissertation last year, he used Snakemake to orchestrate his Nextflow workflows.

Snakemake has a built-in wrapper syntax for running Nextflow, and Edmund found it clean and easy to understand. He ran fetchngs → nascent → differential abundance pipelines, with Snakemake handling the expansion of wildcards for different genomes and aligners. The result? A concise, declarative pipeline-of-pipelines that leverages Snakemake’s strengths for high-level orchestration.

Snakemake also has an include feature that lets you pull code from GitHub URLs—including specific tags for reproducibility. This kind of flexibility is something Edmund has been advocating for in Nextflow for years (he has a long-standing GitHub issue about URL-based includes).

Event-Driven Bioinformatics

Edmund introduces the concept of event-driven bioinformatics, coined by Ken Brewer. Imagine a sample arrives and hits your S3 bucket. That event triggers a Lambda function, which generates a sample sheet, uploads it to S3, and launches a Nextflow workflow on Seqera Platform—all automatically.

No more waiting for a human to manually kick off pipelines on Monday morning after sequencing finishes on Friday evening. The automation eliminates those manual delays.

But Edmund warns that AWS Lambda functions can be brittle. They require careful setup, testing, and maintenance. When edge cases arise, the “glue code” breaks, and you have to patch it repeatedly. It’s not as clean as a Nextflow pipeline you can git-deploy and forget about.

Node-RED + Seqera

Phil introduces his recent project: a Node-RED plugin for Seqera Platform. Node-RED is a low-code automation platform with a drag-and-drop interface where you connect “nodes” (blobs) representing different operations.

Phil built custom nodes for launching pipelines, monitoring runs, and responding to events. The beauty of Node-RED is its vast plugin ecosystem—there are nodes for Jira, AWS, webhooks, Home Assistant, and more. You can trigger a Nextflow pipeline when a Jira ticket is created, or flash your lights green when a pipeline succeeds.

The demo shows an RNA-seq → differential abundance chain: launch RNA-seq, monitor for completion, then launch differential abundance. All the pipeline configuration lives in Platform, and Node-RED provides the orchestration layer via API calls.

But even here, there’s glue code. When RNA-seq finishes, you can’t just pass outputs to differential abundance—you need a JavaScript node to construct file paths, guess output locations, and format the launch configuration. This glue logic is unavoidable and represents one of the main sticking points in pipeline chaining.

Dagster and Temporal

Edmund walks through two more data engineering tools:

Dagster excels at transforming data frames and SQL workflows. While you could write bioinformatics pipelines in Dagster, you’d regret it—Nextflow’s shell-native approach is far cleaner for CLI-based tools. But for pipeline chaining? Dagster fits well, because every pipeline takes a sample sheet (a data frame) and outputs a sample sheet.

Temporal is a durable execution framework popular in the microservices world. It’s built for dealing with flaky third-party APIs and emphasizes retries, timeouts, and resilience. Ken Brewer introduced Edmund to it, and it works for Nextflow automation—but it’s not particularly “bioinformatics friendly.”

Both tools can chain Nextflow pipelines launched via Seqera Platform. They bridge the gap between the file-centric bioinformatics world and the tabular data engineering world.

What Makes Nextflow Special

Ben reflects on why Nextflow has succeeded where many Python-based workflow managers (Airflow, Dagster, Prefect, Luigi, Flyte) haven’t in bioinformatics.

It’s not just the data flow logic—channels, operators, etc. It’s the process abstraction. Nextflow wraps CLI tools in a declarative way, with inputs, outputs, containers, and isolated execution. You take a tool that requires environment setup and file staging, and Nextflow makes it truly executable on its own.

Bioinformatics is fundamentally built on polyglot CLI tools and file formats (BAMs, VCFs, FASTAs). These aren’t going away. As long as pipelines compose CLI tools, Nextflow will be the best way to do it.

Edmund agrees: files matter in bioinformatics. Even as data engineering shifts to Parquet files and Iceberg tables on S3, Nextflow’s ability to stage data sources flexibly means it can thrive in those environments too.

Check out Podcast Episode 57 for Part 2 of this discussion, where we dive into how we’re improving Nextflow for the future of meta-pipelines and pipeline chaining.

Full transcript

Welcome

Phil Ewels: Hello and welcome to The Nextflow Podcast. You are listening to Episode 56, coming out in March, 2026. And today’s episode is all about pipeline chaining and meta pipelines.

This isn’t a particularly new topic, it’s been around for years and it’s basically a natural step when you’re running a lot of Nextflow pipelines to want to automate how those pipelines are launched, how they connect into one another and so on.

There’s always been a lot of interest in this. In the October 2025 hackathon, in Barcelona, we actually had a get together outside where we pulled lots of people together from the community to discuss different things that people were doing and ways we could move forward, which was really productive. Actually in the nf-core slack, there’s a slack channel called #wg-meta-pipelines, so if you’re interested, you can go and join that.

To help me explore this topic, I’ve got two regular guests of the Nextflow Podcast, Ben Sherman and Edmund Miller, both very knowledgeable in the area. And, when we recorded this episode, we actually talked for so long, and got so excited about the topic that we really did too much for one episode. And so I’ve actually, in editing, I’ve split this into two episodes.

So this first one, we basically talk about all the different current existing solutions that people have used, to automate Nextflow, to chain pipelines, and to build meta pipelines.

And then in part two of this, we’ll talk more about the future, how we’re changing the Nextflow language, how we expect to build meta pipelines and do better pipeline chaining in the future.

That discussion was really interesting, enough that I felt it deserved its own podcast episode. So do tune in to the next episode, Episode 57 to hear more about this.

So without further ado, let’s dig in.

Introduction to meta pipelines and pipeline chaining

Phil Ewels: So Ben, I would keep mentioning like meta pipelines and pipeline chaining. Can you us a bit of a description of what these two things are and how they differ and, and some background to what, what this space is?

Ben Sherman: So, I mean, you start with a Nextflow pipeline and you know, usually people build pipelines for particular things, but also there ends up being more to the story where you have your pipeline and then you have things that you happen before your pipeline and happen after. And so you, oftentimes in your workflow, you may have a pipeline of pipelines, and maybe one day it would be nice to just have one workflow language that you can use to just describe, you know, literally everything you do.

But in the meantime, we’re trying to find ways, practical ways to, compose different pipelines together. And there’s two main ways that people usually think about that. One is “meta pipelines” where you would just try to describe everything in one big Nextflow pipeline, for instance. And so you take existing pipelines like say nf-core/rnaseq, and maybe you import that as a sub workflow in a larger pipeline.

And then the other way people think about it is pipeline chaining, where instead of putting everything into one big Nextflow pipeline, you use some other tool, that could be Seqera Platform. Or there are other, tools we’re gonna talk about today, some, some kind of tool that allows you to chain things together.

It usually takes the form of, you know, run Pipeline A and then when Pipeline A is done, take certain outputs of that and feed them into the inputs of some Pipeline B. Basically so that when you wanna run those chain of pipelines, you don’t have to babysit every single step. You can just launch it and then the whole thing runs.

And so these are the two paradigms, they sort of have pros and cons. You know, there might be reasons where you might want to take one approach versus another.

For example, if you want to take an existing pipeline like RNA-Seq and maybe just add one little pre-processing step. Maybe for that it would be easier to just write a meta pipeline, right? And just add your one little thing.

But if you’ve got, a much more complicated pipeline, we need to come up with more words for what we’re talking about. But if you’ve got like RNA-Seq and then sarek, and then you’re merging the outputs and combining them and putting into some third analysis, maybe that you might wanna do, as a pipeline chain. So there’s, there’s different ways to think about that.

I would say both of these things are possible today, but they’re not pleasant. And so we’re gonna end up talking about a lot of the different, pragmatic ways that we found to make things work. And then maybe towards the end, we, we will get more into how we can make this process more pleasant. But that’s, that’s at least an overview of those two.

What makes importing pipelines difficult?

Phil Ewels: I mean what, when I said in the intro that this has been talked about forever, and I remember actually in Barcelona in the Nextflow meeting in 2019, I think it was, that Paolo first announced DSL2. That was the first time this concept of the workflow block in, in Nextflow had ever been described.

I remember talking to Harshil and Maxime and the other nf-core folks who were there, this is amazing, this is gonna unlock meta pipelines. We’ll be able to stitch all these workflows together. So we, we’ve been talking about this, what, seven years.

And if you have a look in the nf-core pipelines and the nf-core template, you’ll see that there’s the entry level workflow, which is a wrapper. And then the main part of the pipeline is in the named workflow called rnaseq, or, named after the pipeline. And it’s for no obvious reason.

And it’s purely historical. It’s because back in 2020 we were really optimistic we thought that we’d be doing this a lot. And so we wanted to have a named workflow that contained the whole pipeline, which other people would be able to import, and use, use as building blocks, you know.

But, that’s never really happened. Like. what, why hasn’t it happened? What makes this difficult?

Ben Sherman: I mean, I, I looked into this a couple years ago and I, I sort of found exactly what you said that a lot of these nf-core pipelines are very nearly set up to make them amenable to meta pipelines.

You know, you have this core workflow, which just takes the input channels and emits the output channels, and then that is wrapped by what you might call the entry workflow that, that handles at least the params parsing. You know, you, we still have all the publishDir stuff going on within the modules, but, and we’ll, we’ll talk more about that later, but on a basic level, a lot of these pipelines are set up where you could, in theory, import that core workflow into a larger pipeline.

CLI tooling to import pipelines

Ben Sherman: It seems like, one of the big gaps is the tooling. So, you know, you think about the way that the nf-core CLI works, there’s already a lot of functionality for installing modules, installing sub workflows, but those things all have to be stored in a mono repo, essentially, you know, the nf-core/modules GitHub repo, for instance.

And so, say, the core workflow of RNA-Seq, you know, what would it look like to import that? Well, you could imagine a CLI, you know, like “nf-core pipeline install rnaseq”. It installs the RNA-Seq workflow directly into your repo. But then it would have to either pull that directly from that pipeline repo or the RNA-Seq developers would have to sort of give up ownership of their workflow and like move it into, nf-core/modules. Right. And you’re starting to push the limit of like what all actually belongs in this nf-core modules repo.

So I think one thing that would make this a lot easier is if there was some kind of tooling, whether it’s part of nf-core or even Nextflow natively, some way to include or install that core workflow from a pipeline, without having to go through a mono repo as the middleman. Having a little more flexibility in installing pipelines.

And, you know, Sarek is a good example of, you know, it is essentially a meta pipeline, but all of those sub workflows in Sarek, which could be pipelines on their own, they’re all baked into the Sarek repo and they’re maintained as part of Sarek.

And so they, it’s a bit more difficult to use them as, standalone things. I think the way you do it, I’m not an expert Sarek user, but basically everything goes through like this top level Sarek entry point, right? And then you tell it which parts you wanna run. Which is a little bit different than saying, I have this sub workflow in my pipeline, and you can just go in and run that sub workflow directly. You don’t have to go through like Sarek as an example.

So there’s a couple different ways to skin this cat, but I think that’s one of the main obstacles right now is, is the tooling for installing these pipelines as workflows.

Overlapping config scopes

Phil Ewels: I’m gonna push back on that a little bit, mainly because I’m one of the main authors of nf-core / tools. We did try this, we tried this back in like ‘21.

As I remember it, some of the big problems were to do with parameters because every nf-core pipeline tries to use the same command line flags, right? That’s part of nf-core. It’s consistency. so if you have three pipelines and you’re trying to import them all, they all have params.input. And that’s baked in. Pretty far in, to the workflow.

And so that was one of the big things was clashes and scoping, of config. Because a lot of nf-core pipelines have very overlapping config, very overlapping params.

And also you had to replicate just tons of config. Because if you pull in just a workflow that’s like, there’s a lot more in the pipeline beyond that. So there was a whole load of config, which was required. Like you’ve got all the ext.args stuff, all the different command line flags. So you end up manually having to copy loads and loads and load the code and then again, scope it.

Not to say it’s impossible and you are right. Probably with the right level of tooling, we might be able to do it. And that is something we could put more effort into and, and hopefully will in the future.

Ben Sherman: That’s a good point as well, is that there is a lot of config that’s mixed in with a pipeline code, and at least my sense is that we need to find a way to decouple those things until we get to a point where you could just pull out a pipeline and not worry so much about what config is wrapped around it, because maybe you can trust that whatever meta pipeline it’s being imported into, that meta pipeline will have all the config that it needs set up correctly. You know, that’s just one example of how to think about that.

Subworkflows or pipelines?

Phil Ewels: Yeah, I think the, you mentioned the sub workflows as well, which anyone not familiar with nf-core, we have the modules repo and you can import or include single modules, which are like one process, but then we also have sub workflows, which might be QC and trimming or, different chunks of pipeline.

And I think this has a bit been our stop gap. Of sharing partial pipelines.

But it has quite a lot of limitations ‘cause you can’t run that by itself. You can’t just say, I want a QC and trimming pipeline, run it. You have to go and actually build a pipeline and include it, which is quite a lot of faff.

Edmund: Yeah, I kind of think it’s like a, it’s a software philosophical question. And I like to think of it like the Unix philosophy of like one tool that does something really well is kind of what attracted me to nf-core originally. And that’s where we have like modules and it does one tool really well for that one sub command.

And then you had like RNA-Seq is just RNA-Seq. And that’s where like, and I feel like we’re just dragging Sarek through the mud on this, but like that’s where it’s like it doesn’t just do one thing really well. It does a lot of things. It’s a great application and it combines all the things in a creative way.

But that’s where the like fetchngs, RNA-Seq, differential abundance is very clear to people and where you get this philosophical like, okay, I can see the clear separation of these pipelines. Could I meta pipeline chain them? Yes, in like one big pipeline like Sarek. But it’s also nice to have them individually and just break it up code wise for like humans to work on and conceptualize

Phil Ewels: I don’t think Maxine will will argue with us. He’s been advocating for breaking up Sarek as well, so

Edmund: somebody. He’s just begging for help.

Phil Ewels: Exactly. We just need the, the mechanisms to do it.

As long as we can keep Sarek as it is, but then also have the other pipelines.

Pipeline chaining

Phil Ewels: So this is mostly meta pipelines we’re talking about, right? Like including or importing other pipelines into one big one. And, you get some nice stuff if you do that. You have all the stuff that Nextflow gives you basically, right?

If you have processes from a different pipeline, Nextflow knows about the whole DAG. So it can be running all those processes in parallel and, and doing proper parallelization.

And also if stuff stops, you have full resume-ability. ‘cause again, Nextflow understands the full DAG and all the processes.

That’s in contrast to chaining pipelines, right? Where we’re just running Nextflow multiple times. Ben, can you tell us a little bit about what’s been done on the daisy chaining side of things?

Ben Sherman: Well, I think it’s important to say that it’s different, but then it’s also kind of the same. It’s, it’s sort of a matter of how you draw lines around things. At the end of the day, you are still running one big pipeline, but some of the details of how that works can vary.

nf-cascade

Ben Sherman: One of the tools we wanted to call out today was something called nf-cascade, which was developed by, friend of nf-core Mahesh, who, who just loves to explore problems like these, on his downtime. So I appreciate the effort he is put into this.

And again, I’m not actually an expert on this tool, so I’m just trying to describe it based on, you know, my own study of it, but basically treating a Nextflow pipeline as a tool that you run in a process.

So you have a process called “nextflow-run”. And in it, the process script is just nextflow-run, and then you would supply as a process input, things like the pipeline name, whatever parameters you wanna run, things of that sort. And then you could imagine composing that into, you know, a larger Nextflow pipeline.

Now, I think the way that Mahesh sets it up is that you actually specify, like a YAML file where you just list out the pipelines you wanna run. And I think there is some capability for like linking inputs to outputs and things like that. But again, don’t take my word for that.

And so this is, another way to do it. Instead of, you know, dealing with all the issues of like, oh, separating the core workflow from all the pipeline config and all that. You just say, let’s just take the whole pipeline and just wrap it up into one little tool. And then, you know, you end up doing Nextflow in Nextflow basically. Where you have this like, top level pipeline, and then the processes in that pipeline are themselves just calling other Nextflow pipelines. This is sort of quick and dirty way to make it work.

‘Cause at the end of the day, Nextflow is itself just a command line tool, right? You can wrap it in a process just like you would any other tool like FastQC or Salmon or, or whatever.

There just are some rough edges around it because, you know, now you’ve got this outer Nextflow pipeline that has its own resume history and work directory. And then every inner nextflow-run also has its own resume history and work directory and all of that. And all that stuff doesn’t necessarily, you know, play well, you know, it might, it, it can play together if as long as you set it up correctly. You have to make sure you don’t have like weird naming conflicts. And, there’s some trickiness I think around caching and resume, and error recovery, things like that.

So I think it works best when you’re basically willing to treat those inner Nextflow pipeline runs as just like an individual tool run where you don’t care about the fact that maybe it’s running, you know, sub tools beneath that. And so, it’s, I think it’s certainly something it’s worth taking inspiration from.

Phil Ewels: There’s a blog post written by Mahesh about this topic, which, I’ll put in the show notes, where he’s written a bit more about this. But honestly it fascinates me that this approach actually works at all. I think it’s, it’s really impressive.

Nextflow in Snakemake

Edmund: Well, if people didn’t know, my dark secret is that before Nextflow, before DSL2, I was a, Snakemake user. So I think this might be a nice segue into that.

As I was wrapping up my dissertation last year, I actually used Snakemake to wrap my Nextflow workflows I’ve got actually the code pulled up here to show some of that.

So this is kind of the same illustration that we were just talking about, where we were going from fetchngs, RNA-Seq to Differential Abundance. In this case, I’m actually doing Nascent in the middle.

So I’m actually pulling files here with get_fastq, but this is just wrapping the fetchngs pipeline itself. And so this is kind of doing the same thing as nf-cascade. Just, I think it’s really clean.

Phil Ewels: Yeah, so, so it’s the same thing here. You’re running basically the head job of Snakemake then the Snakemake rules, the process equivalent, is then pulling and running Nextflow.

Edmund: Exactly. So like I wrote a little small pipeline to make a homer unique map. And if you don’t know what that is, like good for you, you don’t want to know what it is. But it’s just a very custom little, reference genome thing.

But then the real kicker was I had to run a different Nascent pipeline for each aligner, each genome. And that’s where the real magic kind of comes in here, of running that and expanding each of those. And, that’s kinda the beauty of the Snakemake syntax of like, I don’t know how many I wanna run and, then give me these results.

And so these were the ones that I was expecting to compare downstream and where you can see that the different like genome, wildcards across the board on those.

So it ends up being kind of a nice like pipelining for pipelines.

Phil Ewels: Snakemake has this wrapper syntax, right? It’s got built-in integration for running Nextflow.

Edmund: Exactly, there’s a little section right there and so it just spins up like a Nextflow environment and then launches these. So you can actually have a bunch of different pipelines running, at the same time from those. But I was just launching ‘em on an HPC, so.

Phil Ewels: So what is it that’s in this code here, which made you do this with Snakemake? Instead of doing something like Cascade and doing the same concept in Nextflow.

Edmund: I think it’s just really clean and easy for me to like grok and look at. Versus the Cascade stuff, i’m like, I could put this in Nextflow, but I’d be doing a lot of work around to get it into Nextflow to make Nextflow run Nextflow.

Like I could have just done a Makefile is what I actually started with. But I reached for like Snakemake because I was like, well I’m familiar with this. And once you get a little bit more complicated of like, I needed a genome and an aligner difference in each of these. And that’s where like, okay, you start kind of losing yourself in the Makefile versus Snakemake handles that a little bit better.

Phil Ewels: So this is the kind of thing we need to aim for then, with meta pipelines in Nextflow then, do you think? Is this the kind of syntax you’d like to see in Nextflow?

Edmund: Yeah, I, I think I would. I I would like to see that. I think Snakemake also has a really great, and like I’m just gonna talk about Snakemake for a while here,

Phil Ewels: This is the Nextflow Podcast, not the Snakemake !

Edmund: Yeah, yeah, exactly. We is now we’ve rebranded it. It’s the, they have a great like, include functionality for, the GitHub code and you can just pull in random like snippets so you can include a different Snakemake pipeline.

Phil Ewels: From a URL, instead of having to include the files natively?

Edmund: Exactly. Exactly. So they have wrappers as a concept.

You can do CWL, but then this includes, so you can just include another snake file, which is basically an NF file. And again, you can do that entirely in Nextflow. The difference is then you can say like, okay, this is a, the other workflow’s snake file.

And so this kind of comes in, in like the Sarek use case or like, what I like to give as a more reasonable use case, usually when I’m talking about this, is like metagenomics where people are like, okay, I want ChIP-seq and RNA-Seq both in the same thing. And we’re like, okay, you can pull RNA-Seq and ChIP-seq pipelines from nf-core. But it’s the combining of them and making that reproducible as a whole that people want. I think.

Phil Ewels: Yeah.

Edmund: Yeah, that’s kinda the overall segue into the Snakemake podcast here.

Trying to find the, yeah, here you go. So you can just include from a different code hosting provider and say like, this is the repo that I want. Here’s the path, here’s the snake file. And even a tag on that. So that kinda gives you reproducibility at that level and makes it really simple to like, okay, I want Sarek version 3.2.1, and then I want, fetchngs before that. And this is the version of it.

Phil Ewels: That’s something I think you’ve got a GitHub issue on the Nextflow repo you for this. It’s years old, right, Edmund, about includes and, and being able to do includes with, with URLs.

Edmund: Yeah, exactly. And I think like, and it’s funny that this keeps coming up. That was what Rike kind of posed at the core team retreat. But she said like, as a side comment to me, like, what if we just did modules, like instead of doing pipelines with nf-core, because that seems what everyone loves about nf-core, using that across the board, is the modules.

Phil Ewels: I mean, you touched on your, your past life of using Nextflow in Snakemake. I mean, I remember years and years ago doing this with just bash scripts, right? Chaining pipelines, just being like, nextflow-run, nextflow-run, nextflow-run. And you mentioned like make files. You work a lot with different groups using Nextflow. What other approaches do you see?

Edmund: I’ve seen a, I’ve seen a lot, Phil, I’ve seen a lot. I think previously, when I was at Element, we used Python to call different pieces of it. And this is kind of where Platform starts to come in and be really relevant for people. And kinda the purpose of that is then you have an API that you can call to say like, I need to launch this Nextflow pipeline. I don’t need to reinvent the wheel about thinking about how to launch that correctly in a cloud environment. And so you can just call that API endpoint and then it goes off and, and does that. So a lot of Python, a lot of different, a lot of other things that we can also go into.

Automating Nextflow runs

Phil Ewels: Taking a step back slightly from the specifics of chaining and, and meta pipelines. An overlapping topic with both of these, I think is, is the topic of automation. Especially with pipeline chaining, I mean, the, the simplest way to do this is just you run manually one pipeline and then you run the next one after the first one’s finished, which I, I imagine many of us have spent plenty of time doing as well. I know I have, you know, run fetchngs and when it’s finished, you, you then come in the next day and run the next pipeline.

One of the things that springs to mind for me is this Graham at Seqera, who’s like the Kubernetes ninja who knows more about AWS than most mere mortals, has a blog post, quite an old one from like 22, I think, which I’ve been rereading a bit more recently about automation on AWS using all the massive AWS infrastrcture: Lambda and like.

How, how many people are doing that kind of thing, do you think, and when you think more about the automation side, like GitHub actions and stuff, like what kind of patterns do we see coming up?

Edmund: Yeah, I think it’s kind of like the, the evolution of the bioinformatician over time. You know, like the evolution of man.

I remember when I first started, I got an Excel file from my PI that had all of the like bowtie align commands that he ran for each individual sample. Then okay, you get a workflow manager, you know, Nextflow runs each of those in processes. Then we put that in a pipeline.

And then now it’s like, okay, now we’re like going to the next phase of chaining the pipelines. And then it’s like, okay, how do we automatically do this chaining and kick this off when this sample type comes in?

So yeah, I think a lot about Graham’s, blog post. That was a lot of inspiration for me early on and kind of the automation game from those.

Event-driven bioinformatics

Edmund: It just lets you run that on a, an event is more kind of the idea, and this is, I can’t coin that. Kim Brewer came up with the concept of event driven bioinformatics. So that’s where you have a sample come in, it hits your S3 bucket, and then that triggers an event. And that event then kicks off your lambda function that says, okay, generate me a sample sheet and then pull this back down and then upload that to a different S3 bucket. And then that has all of your FastQ files. Okay, now launch this workflow in Platform.

And that’s kind of the overall like vision of the automation. Where like you may have done one or two of those steps manually, or you had like a script that you kicked off and ran that yourself. When someone pinged you on Slack and was like, Hey, this sample came in, can you run this?

And then you get rid of that extra ping and waiting on the human to respond to it, something about it because you’re just cutting out the manual labor there.

Phil Ewels: Your sequencing finishes on a Friday evening, but you don’t kick off the pipeline until Monday morning. And you know, just because you have to sit there and click.

That Human in the loop kind of delay, even if it’s not very much work for that person, but just having to actually be there and respond, being able to eliminate that is a big, win.

Edmund: I think the real issue with that is like AWS lambda functions are kind of difficult to deploy for like the mere mortal. I remember doing it back in 2022 pre ai and it’s like, yeah, maybe nowadays it would be easier to like find that, but you have to like build it, test it, redo it, and then like the glue like it, I see it as like the glue is really brittle.

You know, if you’ve ever done like an epoxy or something and it just kind of. You know, it’s really, it’s like it hardens and then it’s like, ah, it hit this edge case and that broke, and then you have to go back and fix the glue and then you have to fix the glue again. and so that’s kinda the issue with Lambda functions is it’s not like a clean Nextflow git deployed operation where you’re like, okay, I just have to think, like, I get this to work, I push it up and I forget about it.

Phil Ewels: AWS is is brilliant, but it’s not particularly known having an easy to use interface. I think looking, I’m looking at Graham’s blog post now. It’s immediately diving into these really complex config files and it’s. It’s a bit of a beast. It’s high throughput and, and solid and everything, but, but not easy to get going with.

Node-RED + Seqera

Phil Ewels: On that note, I’m gonna do plug for one of my projects, which I did at the end of last year, which is based off, off the back of these requests coming in.

Some of the really common ones are, you mentioned it Edmund already, like, you know, files dropping into S3, and integrations with other systems.

And I’ve been thinking for a while about how no two people have the same requirements for automation. Everyone has their own systems that they need to integrate with. Everyone has their own custom setup and stuff.

So you can’t, it’s quite a difficult thing to generalize, but, but one of the things I really like is the, the way that Platform is able to wrap. You set up all your configuration for pipelines once, and you can do that via the UI and everything, and you kind of store it in Platform. Platform is able to encapsulate this, act as a control plane, encapsulate all the running, how to run each pipeline, how to run each unit, and then you can just ping it with the API.

And so something I found off, I actually got to know it because of using Home Assistant on my little Raspberry Pi to control how bright my lights are behind me and all this kind of stuff. And that has Node-RED built in.

And I ended up building this, which is, which is an open source plugin for Node-RED.

And with Node-RED , it’s low code interface where you can drag and drop these different kind of blobs onto a canvas and connect them. And so I just made some new blobs, some new nodes for, for doing different things within Platform with your Nextflow pipelines.

The easiest way to show this actually is I can run it in Studios. The installation’s really, really simple, there’s videos to lead you through it as well. But you can just copy this this Docker image, which I made, if you go into “Add Studio”, you can just say use custom container and you just drop that in it works. It takes a couple of minutes to load up. I’m gonna connect to this one.

But, but this is a really simple interface. And you, you can see like, even if you’ve never seen Node-RED in your life before, you look at this screen and have a pretty good idea about what’s going on, right? You’ve got of objects and they’re connected by these wires. And, if you go to import, it comes with a whole bunch of examples of common things. So like, you know, we talked about RNA-seq differential abundance, so let’s have a look at that. I can just import that example.

We’re kicking off an initial pipeline, launching RNA-seq here, and then just monitoring it to see if it finishes. And then these different outputs of the different scenarios. So that’s if it succeeds and that if it fails. So if it succeeds, then we go down here and then we launch differential abundance, and then monitor differential abundance.

So this is pipeline chaining, right? We’re just, we’re just kind of encapsulating all the config that we talked about, how that’s difficult to manage all the config lives in Platform now. Then we’re just calling Platform from this automation layer, which is in Node-RED.

Sorry, I’ve been talking a bit by myself there. I had fun building this.

Edmund: No, No, it’s, it’s beautiful. I, I was so excited when you came up with the idea and like pushing a button and launching a pipeline and then the flag raising in your backyard, you know, that was, it’s a beautiful, a beautiful pitch.

I, it always makes me think of Galaxy though, whenever you pull it up. We’ve, we’ve just circled back around, it’s the other end of the bell curve of, of like just use nodes connected to nodes.

Phil Ewels: Yeah, exactly. You could build a whole workflow like this, but, no. Let’s not do that.

Edmund: It’s easier when you take it down to that, like the bigger level, the, like RNA-Seq differential abundance, and you’re like, okay, this is like, you know, nine nodes, you know, and maybe like, we’ll get to, to a place where it’s too complicated and we’re doing too many things again. But each of those, you know, has hundreds of processes listed together.

Whereas when you’re looking at that galaxy, you’re like, okay, this is overwhelming. Where’s the, the bigger box for me to conceptualize as a human.

Phil Ewels: Yeah. Exactly. I mean, if you look at the, or the DAG, the mermaid plot that’s generated for RNA-seq, there’s a hairball. Yeah. You know, it’s so zoomed out, you can’t see anything. So having that as a single, single node here is nice.

Node-RED flexibility

Phil Ewels: But I’ve been quite pleased with this. Like what you can’t really tell here is, is the flexibility, which I think is the key selling point for Node-RED.

Node-RED has been around for years and years and years and it has a huge open source community. And there’s so many of these nodes. This is just vanilla, the ones that by default, but you can see, and I can drop that in. There we go. And then drop that in. And now, now I have a web hook which I can use to launch my pipeline and just hit that URL from anywhere and it’ll work.

You know, if I go down here then, you can read files and set up different listeners and do different events based on different things changing.

And this is where the demo you were talking about Edmund is, ‘cause I have this running on my, home assistant. I can pull in, you know, when a light switch gets pressed, I could launch a pipeline. Or when the temperature drops below minus 10 that could, you know, do something stupid. Then I’ve got one of the last video in the series is, you know, when the pipeline succeeds, all my lights go green and start flashing and confetti files on my, my computer monitor and stuff.

So you can, you can really, you can do anything. The sky’s the limit. I posted on Slack and someone was saying that they had a workflow where they wanted to automatically launch Nextflow when a Jira ticket was created. And I was like, that’s probably like a plugin for Node-RED. Like quickly search, bit like, yep. There’s a Jira one. Could, look search the modules and sure enough there’s a whole bunch of, of Node-RED modules for Jira. If you wanna do that, it kind of makes me feel a bit nervous. But you know, there’s, you can just do anything and tie AWS into the cloud. It is, it’s a lot of fun. Alright.

Ben Sherman: Well, it’s nice just to have a, a playground for this kind of thing. Because, you know, it’s like we’ve said that it feels like everybody is doing this automation layer, you know, everybody’s doing it their own way and it feels like something that should be very straightforward. Right? It’s, it’s just automation. It’s the same kind of thing that you do when you wire things into a Nextflow pipeline, right? But for some reason, when you get to this level, suddenly it gets like really nebulous.

And so even if, maybe we go beyond this one day, it’s nice to just have the fact that Node-RED has a plugin for just about everything that so many people have contributed to it. It’s very good for prototyping and just for showing that it’s possible.

And then from here we can think about like, okay, what are the main happy paths that we wanna enable? And then, you know, maybe we can build something more specialized off of that. But until then, it’s very nice to have this.

Phil Ewels: Yeah, and I think the other thing is these things don’t have to be mutually exclusive, right? If we are able to come up with a really nice way to have meta pipelines, then that’s, that’s, that’s a good thing, right? Then just always this chunk in the middle just becomes a single block and like, like we say, that the granularity just drops the level, but we can still have all the integration into the ecosystem.

Glue code

Phil Ewels: Something you can see in this demo actually, is that you still need quite a lot of this logic in here, and I was a little bit surprised about how much logic was needed.

When the RNA-seq pipeline finishes, you can’t just immediately pass those outputs onto differential abundance. You have to have this little JavaScript node here, which does some stuff and it says like, you know, where, where is the output directory? And basically guess what the, the results well will be the paths, you know, based on that. Then kind of construct all the launch configuration the next pipeline.

So it, and we started talking about this kind of code as glue code. And I think that, that the more you start to dig into this, the more you realize that this glue code is basically always needed. And is one of the main sticking points, I think.

Edmund: I think you’ve probably seen that best Phil, with like nf-core and I, I think you started on like CircleCI, right? You know, and you start out with like this one little YAML and it’s doing this one thing and like how hard could it be to automate all of this?

And then you end up where it’s like, oh, well I have this more logic. And then now you look at the modules like YAML files and we have actions within the repo actions outside of the repo that we maintain.

And you suddenly like, you have to cover all the edge cases, and so it can start out really simple, but it eventually grows more complex over time.

Phil Ewels: Yeah, and, and matching the interface between for pipeline chaining as well, which is this particular example, like just just having things named the same and, and sort of knowing where files will be.

When you are a human doing it, it’s pretty obvious ‘cause you’re looking at directory and you can see where everything is.

But when you’re, when you’re setting this up for automation, it needs to be bomber and it needs to be so predictable. And ideally scriptable.

Other automation frameworks

Phil Ewels: Node-RED is the one that I’ve been playing with the most. It’s just, old, like I am. I knew about it before I started, started playing around with it. So it was kind of an, an obvious choice for me.

But I mean, Node-RED is not the only tool like this. Right. I mean, Edmund, I know you’ve been playing with some others which are doing similar stuff. Can you, can you walk us through some of those?

Edmund: Yeah, definitely. I don’t know, I, I just kind of have a interest in workflow managers, you know, with Snakemake and Nextflow on those, and then that kind of naturally spread to different data engineering pieces on those.

So n8n another common example that a lot of people have been using for AI workflows and kind of speccing out this logic for agents that they may be writing.

One that I’ve watched for a long time is Dagster. It’s kind of a “Airflow, but it doesn’t hurt”, is how I like to, like position it the overall market. It’s just Python and it’s good at doing SQL and ETL workloads and all that stuff and just kind of connecting the pieces that you need.

Another one that actually, that Ken Brewer showed me when we did a talk for SLAS was actually Temporal. And so it’s actually really interesting ‘cause it’s got Go, Java, Python, has different options, durable execution across the board on those. It’s, it’s really great at dealing with flaky services, is what Ken likes to position that as.

Bioinformatics pipelines vs. ETL workflows

Phil Ewels: When heard these tools names before, there’s usually people asking me how Nextflow is different to Dagster, and why should I use Nextflow instead of Dagster for writing my my Bioformatics pipeline? How do they compare, in that sense?

Edmund: Yeah, exactly. That’s a fantastic like, concept to kind of clear up and where I think a lot of people reach for like, you know, I think Luigi was popular back in the day of like bioinformatics pipelines. And that was written, I think that was Spotify, right? And it, it, like, it, it doesn’t end up working out because bioinformatics is heavily shell based and you have a lot of CLI commands and you can do that in Python, in R in whatever, but it’s just not native.

And that’s where Nextflow really get that right. It is just shell native, hooking shell scripts together in like little black boxes.

So you could write bioinformatics pipelines in Dagster. I think you’re gonna regret it overall when you get into more complicated things. You know, I, I think it’s just, it’s, it’s again the complexity of like, you can write a toy example in a lot of these. You can write the nextflow-io / rnaseq-nf in in plain Python and get away with it. It’s just like when you start getting to Sarek size, RNA-Seq size, are you gonna like crumble underneath the complexity of that or not?

Phil Ewels: It’s mostly about how you are scripting the, the, the workflow logic is the thing that you think is most different.

Edmund: Yeah, exactly. It’s kind of, it’s the business logic piece of it and like again, Humans conceptualizing that and being able to like say, okay, this is a DSL and like, I don’t need to reinvent how I’m writing this Docker container every single time. Or like specifying container or like conda files.

Phil Ewels: So the bespoke language that we’ve created for Nextflow. Because it’s targetted at exactly the use case that we’re interested in, basically it’s cleaner and it’s easier to work with.

Edmund: Yeah, exactly. That’s the whole concept of a DSL.

Tangent: What makes Nextflow special

Ben Sherman: It’s something that I find very interesting about Nextflow is that I feel like a lot of people, when they think of Nextflow, they think of like the, the data flow stuff. Like it’s got all these weird like workflow logic and operators and channels and all this stuff.

But I think the, the thing that really, distinguishes it and is probably the main reason why it was successful is really more the process, and the fact that you can sort of take a CLI tool, wrap it in this process. You know, it’s all declarative. You have the inputs and the outputs, the container.

You actually, you take a CLI tool, which, you know, in a sense is, is executable. But you have to get the whole environment set up and stage all the right input files. And a Nextflow process actually wraps that and makes it actually executable on its own. And so it was designed to meet computational scientists where they were, and where they continued to be.

Whereas a lot of these Python based workflow managers, I mean Dagster, Prefect, Airflow, Luigi, Flite, you know, a lot of them work in the same basic way. They’re really geared more towards manipulating data frames. Maybe doing SQL transformations, but fundamentally, all of your data is living in data frames and it’s all living within, you know, the Python runtime, which is just not how the bioinformatics world is set up.

And it’s funny because I’ve seen people in the bioinformatics world try to, to do the more like data frame, DuckDB kind of approach. I remember, Edmund and I were at the airport one time ‘cause we happened to be taking the same flight back to Dallas. And I asked him like, do you think, these bioinformatics CLI tools are just gonna all gonna be, are they just gonna become like Python packages and, you know, just replace all the, the BAMs and VCFs and all these file formats, they just could become data frames.

And I mean, Edmund, you, you said basically that wouldn’t happen that these CLI tools are just baked into the cake at this point. They aren’t gonna change. Which is good for Nextflow because it means that, you know, our, our paradigm may have some longevity. Basically as long as pipelines are being built by composing different CLI tools or polyglot scripts, you know, Nextflow is gonna be sort of the best way to do that because of the way that the infrastructure that it provides for wrapping tools into processes, right.

Phil Ewels: And, and dealing with all the file staging stuff like you say, which is a huge headache actually. I take that for granted with Nextflow. But actually that isolation of tasks and avoiding file name collisions and a lot of, lot of heavy lifting is doing for free.

Edmund: I see you’ve never used Snakemake then Phil. Yeah, I think that’s the beauty of it. Yeah, thanks for bringing that back, back to my memory Ben, of like, yeah, it’s the files that actually matter. And where in like classic data engineering where, I don’t know, you’re just doing numbers for a bank or like trying to catch fraud those. It’s more of like a database properly.

And like when you look at all these bioinformatics files, like you look at a BED file or a GTF and it’s like, okay, that’s a database. You’re just putting it in a file, you know, and not putting it in a CSV, or like even a FASTA file is like, you know, it could be a database, it could be two, two columns in a database, but like we put it in a file and like it is what it is. And until you start putting those in databases for whatever reason.

But the funny thing even is seeing like the data engineering world kind of like shift to Iceberg, where it’s like, okay, now we have files on S3, and so they’ve shifted to files and where that’s the database because it’s cheaper to store. Ultimately, you don’t need a hot database to go and pull this like the data out that you’re not referencing often. And so, I don’t know, maybe bioinformatics was right all along.

Ben Sherman: Well, and even if the, the data story evolves into being more about, you know, Parquet files, sitting on S3, and using more like proper database techniques, in that way. Nextflow can still thrive in that environment because at the end of the day, as long as Nextflow knows how to stage some kind of data source as a file in a directory for the tool to use as input, then it doesn’t really matter where that, whether that source is, you know, a FASTA file or some kind of Parquet file or some other database thing.

Nextflow can provide that translation and, just give you a lot of flexibility. You know, you don’t have to fit everything into, you know, the Python runtime. You can have a bit more, you know, variety if that’s what you need.

So we’re going a bit off topic now, but I feel that there is, there is a connection here between, you know, data wrangling and pipeline chaining. I do kind of see it as all one big problem.

Phil Ewels: I was gonna try and pull this back on track, but I was, I was enjoying the tangent too much, so I, I thought we should see where it goes.

Anyway, where were we, Edmund? You were gonna, you were gonna show us Temporal right?

Dagster automation demo

Edmund: I’ve got Dagster demo first and then we can run through temporal as well.

So this is just kind of another concept and while like point and click stuff was great, now we have AI agents to write all the code for us. So suddenly these complicated systems where, you know, like you’d have to learn essentially another Nextflow to like conceptualize some of these things, are just readily accessible for AI agents to write the code and then you can visualize it in this beautiful visualization and kind of graphically think about these.

So this is just a quick example of like the lineage on this, of going between like the actual just normal like input and then going into running Sarek mouse removal, and then doing HLA typing and then doing mouse removal for the RNA-Seq, and then going into different pieces of RNA-Seq pipelines on those, and then pulling in a complete analysis at the end in a Nextflow pipeline.

So each individual node on this graph is a Nextflow workflow, launched on Platform.

Phil Ewels: Where, where’s the glue code here? Where’s the sample sheet reformatting and stuff.

Edmund: So exactly, that’s kind of the, the PDX, here. And that’s kind of the whole concept of Dagster’s assets for each of these. And I can show you a little bit more, probably better on the temporal piece of, like this one we’ll just launch it and then the different like resources. And this is, you know, a lot of different jargon between all of this and Seqera and connecting to that.

Phil Ewels: That’s a good point. I, I didn’t say that with Node-RED, but Node-RED also saves its workflows as JSON you can just dump that into Claude. And it does a pretty good job of writing new, new workflows for you. It’s a very good way to jumpstart your workflow.

Edmund: Yeah. That’s just kinda the overall like, concept of these. So it, and I, I just see it as the glue code is kind of becoming like less brittle like I was talking about earlier where it’s like, ah, you know, the epoxy’s breaking and now it’s, it’s pretty easy to like, fix this and like kind of make it more self-healing of sorts on those.

Ben Sherman: You know, with our discussion about Dagster versus Nextflow, we were saying how Dagster is very well suited for if you’re just, you know, transforming data frames, right?

And so when you’re at that level of just pipeline chaining. Well, that’s exactly what you’re doing. ‘cause every pipeline is taking a sample sheet and spitting out a sample sheet. Another word for a sample sheet is a data frame. And so that actually does, it fits quite well there.

Phil Ewels: We need to come up with some other kind of glue, which is very elastic, to fit your analogy Edmund. What’s the opposite of epoxy?

Edmund: Well, if it takes me much longer, you’ll come up with something here, I’m sure. This is kinda the issue with some of these is like, yeah, you need like a worker and you need a head node as well. And while it’s great and it’s fantastic, once you get that all up and running of making sure that your various like API connections don’t die out in the middle. It’s a lot more to maintain and an entire like platform to run on those.

Phil Ewels: It’s kind of like, and you mentioned airflow and stuff, like these systems are, it’s not like Nextflow where you can just kick off a job and it runs. There’s a whole, a whole system you have to set up.

Edmund: Exactly. So I’ll show you temporal next. This is a demo that Ken and I spun up

Temporal automation demo

Edmund: okay, so this is Temporal. This is just kind of the overall dashboard. you can have different name spaces in here, but this is the real, like, interesting piece is actually launching a workflow from these.

And so this, this is again, going back to our like sequencer example on this of like, oh, hey, this sequencer file got added in here.

And so in this, the different pieces of the workflow get triggered. Let me see if I can reset this. Yeah. Start workflow like this one as this is where you gotta like learn a whole new u new UI for each of these is kind of the, the issue and where is it, you know, bioinformatics friendly and bioinformatics native on each of these.

But again, so we’re just fetching the metadata, uploading a dataset to Seqera Platform, triggering the workflow, and then now we’re just watching the workflow and waiting for it to finish and then we could kick off something else downstream of that.

So yeah, that’s kind of the overall Temporal. We’ll make sure to link the, the original talk from that. And this is also in our Hello Automation repository, as well on GitHub.

Ben Sherman: Yeah, temporal is interesting. I, I haven’t, I haven’t used a lot of it myself, but I’ve, I’ve studied it, you know, just as Edmund has in the course of trying to understand how is every industry vertical, how they reinvented their own version of workflow managers. Right.

You know, Nextflow is very influenced by the bioinformatics world. Dagster is very influenced by like the ML data engineering world. And Temporal is seems to be very influenced by, I guess you could say like the microservices / backend services world.

Where the emphasis is very much on, I’ve, I’ve got some website and I need to interact with all these different third party APIs, and half of them go down half the time. So I need to come up with a strategy for each one of them of like, how many times am I willing to try it, like retry it, you know, how long am I willing to wait?

Whereas, you know, retry in Nextflow is just like this one feature among many and in Temporal that seems to be like the whole thing is like doing retries and things like that.

And so it’s, again, it’s, it’s probably possible to set up, pipeline chaining in that. But to Edmund’s point, it may not be as bioinformatics friendly. It, it may feel very weird, and that’s because of that sort of difference in emphasis that they have.

And durable execution. You know, temporal is sort of the most famous example of this, of durable execution frameworks. I guess you could say they’re all just workflow managers, but there’s a whole bunch of them just like it. But there’s ones that get even weirder than that. So.

Phil Ewels: And I, I think of the things that’s quite interesting, I mean, Edmund, your examples and also my Node-RED example, we’ve, we’ve not done that with Nextflow. We’ve done that with Platform and, and there’s a reason for that.

I think the reason I did it that way is that dealing with, Nextflow, which is very bioinformatic centric, is quite difficult. But when you encapsulate that domain specificity within Seqera, then you end up in the normal kind of tech world where you just have some API calls.

Then that’s much easier to bring into these other systems, which are more designed for, for working with those kind of events and event driven mechanisms and API driven.

So it’s actually quite a nice way to kind of bridge that gap, and, and, and kind of compose these different systems.

Edmund: It’s bridging that, like the file world of bioinformatics to the tabular world of other data engineering and like reporting that out to a dashboard or something.

Wrap up

Phil Ewels: Okay, I’m gonna do a bit of a hard cut at that point in the podcast episode. Apologies for that, I wasn’t planning for this to be two episodes when we recorded it.

But we’ve hopefully done a pretty good overview at this point of all the different approaches which people do today, to chain pipelines, to automate pipelines, to build workflows and meta workflows.

So hopefully that’s whet your appetite for part two. In part two we say, okay, none of these solutions are really perfect. But what, what are we doing about it? How are we fixing this in Nextflow? And what can we look forward to in the future? How do we shape this to, to work in a way we want it to work?

That discussion was really interesting, enough that I felt it deserved its own podcast episode. So do tune in to the next episode, Episode 57 to hear more about this.

And I’ll see you there again, with Ben and Edmund. So thanks very much for sticking with us and I’ll catch you in the next episode.

Back to podcasts