Seqera

Nextflow strict syntax

Introducing Nextflow Strict Syntax: Cleaner Code, Better Errors

Nextflow is evolving! In episode 51 of the Nextflow podcast, Phil Ewels and Ben Sherman discussed the upcoming Nextflow strict syntax - a significant step toward improving developer experience with clearer error messages and a more consistent language framework.

Key links

“Everything that I’m about to talk about is already documented. So all you have to do is go to the Nextflow Docs. There’s a new page called Updating Nextflow Syntax.” - Ben Sherman

Here are some of the links that were mentioned during the podcast:

Docs: Updating Nextflow syntax
Docs: Standard library

What’s changing and why?

The Nextflow language has historically been a Groovy DSL (Domain Specific Language), meaning Nextflow scripts are Groovy scripts with added functionality. While this approach enabled rapid development of Nextflow itself, it sometimes resulted in vague error messages and unexpected behavior when users made mistakes.

With the strict syntax, Nextflow is asserting itself as its own language with well-defined rules and syntax that is optimized for working with data flows and bioinformatics pipelines.

“We are going to fully define what syntax is allowed in Nextflow. When there’s a mistake, we’re going to provide errors that make the most sense to a Nextflow user, not just a generic Java or Groovy developer.” - Ben Sherman

Key syntax changes

Here are the most significant changes coming with strict syntax:

1. No statements mixed with script declarations

All actual code (variable declarations, print statements, process/workflow calls, etc.) must be moved into a process or workflow block - they can no longer live at the top level intermixed with process and workflow definitions. This reduces confusion about where code should be placed and ensures parameter validation only runs when intended.

In DSL1, mixing statements with declarations was the standard approach, with channel logic scattered throughout the top level. While DSL2 moved much of this logic into workflow definitions, top-level code was still permitted. With strict syntax, all actual executable code must be contained within process or workflow blocks, typically the entry workflow. This creates clearer boundaries, allowing scripts to be used either as entry points or as modules without unintended code execution. This change also improves the behavior of commands like nextflow inspect, which no longer needs to process parameter validation code when examining pipeline containers.

2. Custom classes are no longer allowed

Custom class declarations are not supported in the strict syntax. As a workaround, you’ll need to move any custom classes to the lib directory. The team plans to introduce a cleaner class/record syntax in the future that will be more Nextflow-idiomatic than the current Groovy approach.

This change primarily affects power users who currently use custom classes as a type of record structure (for example, as an alternative to meta maps). The removal is temporary - the Nextflow team plans to introduce a more Nextflow-specific class/record syntax that will be simpler and clearer than the current Groovy implementation, eliminating the need for additional annotations and boilerplate. Moving classes to the lib directory serves as an interim solution until the new syntax is available, which should arrive later in the year.

3. No for loops, while loops, or switch/case

These imperative programming constructs are being removed in favor of more declarative approaches using operators and the standard library. This encourages code that better aligns with Nextflow’s dataflow paradigm.

The removal of traditional loops steers users toward Nextflow’s more declarative programming model using operators like map and flatMap which inherently implement loops but in a dataflow-oriented way. For collection operations, methods like collect and each from the standard library (recently expanded and documented) provide alternatives to traditional loops. While the while loop might return in some form for condition-based iteration, the focus is on encouraging idiomatic Nextflow patterns that better align with its parallel execution model. The team is open to helping users translate difficult patterns and may consider adding appropriate language features if needed.

4. Type annotations will be changing

The strict syntax will eventually introduce Nextflow-specific type annotations. If you’re among the users who have been using Groovy-style type annotations, the language server will currently tolerate them while the team works on the new syntax.

Though technically not allowed in strict syntax, the language server makes a special exception for Groovy-style type annotations, recognizing their usefulness for runtime type checking. This is a transitional approach - the team plans to introduce Nextflow-specific type annotations with syntax optimized for the language. Users are encouraged to keep their existing type annotations for now, as the team will support both styles during the transition period and eventually provide tools to automatically migrate to the new syntax.

5. No more `addParams` in includes

The addParams feature, which allowed overriding parameter defaults when importing modules, is being removed as it creates global variables that make code difficult to reason about. This pattern, originally envisioned as a way to control module behavior in early DSL2, has proven problematic as it introduces hidden side effects - when reading module code, there’s no indication that parameters might be modified by importing scripts. The recommended approach is to use explicit workflow inputs instead, treating inputs as you would in any other programming language, which improves code readability and maintainability.

6. Process script section requires the `"script:"` label

The script: label in processes is now required unless the script is the only thing in your process. This enables better error checking by providing clear boundaries between process sections.

With strict syntax, the script: label becomes mandatory in any process that also has inputs, outputs, or directives. This seemingly minor change significantly improves error detection capability. Previously, when encountering syntax errors in process sections, Nextflow couldn’t always determine if the problematic code was an incorrectly formatted directive or the beginning of an unlabeled script section. By requiring explicit labels, the parser can now provide more precise error messages. This change is easily fixed and may be addressed by auto-formatting tools in the future.

7. Config syntax is more strictly defined

Configuration is limited to config blocks, settings and includes, with arbitrary code no longer permitted at the top level (though expressions are still allowed as values). This simplifies error checking while maintaining necessary dynamism for configuration files.

While Nextflow configuration files previously allowed arbitrary Groovy code (loops, classes, functions, etc.), strict syntax limits top-level elements to config blocks, settings, and includes. However, significant dynamism is preserved where it matters most - in the values assigned to settings. These values can still be Groovy expressions, dynamic strings, or even self-executing closures. This approach preserves the flexibility needed for dynamic configuration while providing a more predictable structure that enables better error checking for common mistakes like missing equal signs or typos in option names.

Timeline and Adoption

The strict syntax will be introduced as an opt-in feature in Nextflow v25.04.0. The tentative timeline:

v25.04.0: Strict syntax available as opt-in
v25.10.0: May become the default (but still opt-out)
v26.04.0: May become mandatory (earliest timeline)

A slower timeline would double these periods if needed, based on community feedback.

How can you prepare?

Use the VS Code extension with the language server - it already uses the strict syntax parser
Check the documentation - See the “Preparing for strict syntax” page for detailed examples
Move problematic code to lib/ if needed - For code that’s difficult to migrate

Benefits of adopting strict syntax

The strict syntax enables:

Clear, relevant error messages
Better code readability and consistency
Future improvements like robust type checking
Auto-formatting capabilities
Enhanced IDE tooling

By tightening the language definition now, the Nextflow team is laying groundwork for features that will make pipeline development faster, more intuitive, and less error-prone.

The Nextflow team is actively soliciting community feedback as they roll out these changes. If you encounter issues or have suggestions, reach out on the Nextflow Slack or community forum.

Full transcript

Podcast Ep51: Strict Syntax

Welcome and introduction

Phil Ewels: Hello and welcome to the Nextflow podcast. You are joining us on episode 51, and today we’ll be talking all about the upcoming Nextflow strict syntax.

My name is Phil Ewels. I’m product manager at Seqera, and I’m joined today by Ben Sherman. Ben is a senior software engineer working on Nextflow at Seqera. And today we’re gonna talk about some of the new features coming to Nextflow language.

Ben, it’s great to have you back.

Ben Sherman: Hey, Phil. Good to be here.

Phil Ewels: What are we gonna talk about today?

Ben Sherman: Yeah. So, uh, we’ve, We’ve, you and I have talked for a long time now on multiple podcasts about, how are we gonna fix all these, poor error messages and just improve the developer experience? And now we’ve made the first step towards that with the language server.

And coming soon we’re gonna be moving that same sort of error, checking into Nextflow itself, that you get both.

And people have, have noticed that, the language server is quite a bit more strict than Nextflow was before. Mostly things around the edges, depending on how you write your code, it’s, it can be more or less annoying to deal with.

So, I wanted to uh, take some time to try and explain how I’m thinking about the language now that we have this ability to be a lot more strict. Why we’re being more strict in certain areas and what the benefit to you is gonna be.

Language server background

Phil Ewels: Brilliant. So if we just back up for a second. If anyone who’s listening to the podcast who’s not familiar with language servers, with VSCode, who hasn’t listened to those earlier episodes. Can you give us a quick background of how Nextflow’s written, its history with Groovy and what this overarching work is around with the language server.

Ben Sherman: So the Nextflow language is uh, we, we’ve always called it a Groovy DSL. Basically means that when you write a Nextflow script, you’re writing a groovy script, but there is some extra sort of functionality that gets added in the groovy interpreter. To support things like process definitions and workflow definitions and things like that.

But fundamentally, you’re still just writing a groovy script with some extensions. And while that was easy for us to develop and spin up really quickly, it’s not so great for the developer. It’s fine when you write your code correctly. But then the moment that you make a mistake the error messages tend to be vague or they happen way later and they’re not related to the actual problem. And there’s all sorts of weirdness that comes along with that.

And so what we did with the language server was to actually replace all of that and say no, no, now Nextflow is its own language.

Now it still may use Groovy under the hood, but we’re trying to make it clear now that we are going to fully define what syntax is allowed in Nextflow. We’re going to build out the tooling, like the parser to parse the Nextflow code. And we know when there’s an error or a mistake, we’re going to provide errors that make the most sense to a Nextflow user. Not just a generic Java or groovy developer.

If you’ve used the language server in VS Code. If you’ve downloaded the latest Nextflow extension, or some of you have figured out how to use the language server with, Neo Vim or other editors like emacs, you may have noticed that the language server is a lot more strict than when you just do “Nextflow run main nf”.

And that’s why: it’s because the language server is using a Nextflow specific parser to parse all of your code and understand it. For now that’s just a thing within the language server. But the plan is to move it into Nextflow so that eventually when you do run “nextflow run main.nf” you will get all those same errors.

Maybe initially just as an opt-in thing, and then eventually as the default, and then eventually it’ll be required, which is why we wanna make sure that we’re clear on what are the little changes that you might have to make, right.

Phil Ewels: So when you say strict, what we’re talking about here in VS code is that you get a little or wiggly red line basically under some of your code saying there’s an error here

And something we’ve heard quite a bit in the community since the new VS code extension was released is like, “Hey VS code over here is saying, I’ve got an error, but it runs fine. Like, how do I turn these errors off? Very annoying.” And we’re like no, don’t turn them off.

Ben Sherman: It before it was people complaining about bad error messages or they’re not being the right error messages and now there’s too many errors. So we’re trying to find that sweet spot,

Phil Ewels: Exactly. This is a good thing. The errors are a good thing. They’re helping you.

Syntax sugar is just empty calories

Ben Sherman: I’d like to sort of lay out sort of a framework for how we’re thinking about syntax.

One of the things that a lot of programming language designers have learned over the past couple decades is that oftentimes when you create a language. It’s very easy to add all types of syntax, sugar, right?

In case you don’t know what I mean by syntax, sugar is, shorthands and shortcuts, short ways to write code in certain ways. For example, being able to like omit semicolons and parentheses and commas in places where maybe you don’t need it, right? And so that can allow you to have, much more concise syntax in many cases.

But what a lot of programming languages have discovered is that has drawbacks as well.

Whenever there’s 2, 3, 5, 10 different ways to express the same idea in your code, for one thing that makes it harder to learn, and it makes your code harder to read. Because even though, okay, it’s nice to have this little shortcut here. Now you also have to learn that shortcut.

Even if you don’t use it in your own code, somebody else might. And if you’re going and looking up a nf-core module, or using somebody else’s Nextflow code, well now you have to know what are all the different shortcuts that people might possibly use, right? So that’s more of a burden in your brain when you’re reading code.

And I think what a lot of languages have learned is that we actually spend way more time reading code than we do writing code. So it’s actually a lot more important that we make sure that code is easy to read as possible, even if that means that maybe sometimes you have to type out a few more characters. You only spend a very small part of your time doing that.

And so that’s what a lot of these changes are focused on, it’s about removing, different ways to express the same thing, trying to remove confusion, trying to increase the level of clarity and consistency, in people’s Nextflow code so that it’s easier to read and understand quickly.

Phil Ewels: It reminds me a little bit of Perl. I started my bioinformatics career writing Perl and it’s, it was a language written by a professor in language, I think spoken language. It was designed as a programming language to be easy for someone to read, who knows the English language.

It introduced a lot of concepts that are used in scripting languages, but it’s famous for having a hundred different ways to do the same thing. Which sounds great, but actually when you go and read anyone else’s Perl code, it just looks like an absolute mess. And even if you go back and look at your own Perl code that you wrote a couple of years ago.

You go through different patterns of writing and solving problems.

Bespoke language syntax

Ben Sherman: Another principle I would say, as we evolve the language and add new syntax. Now, the nice thing about having a Nextflow specific parser is that we can actually add new syntax that has nothing to do with groovy. It may not be valid, groovy syntax. We might compile it to groovy under the hood. Sure. But now that we have this freedom to do whatever syntax we want we can now think about what kind of syntax do we want to appeal to?

And so now I can look at, what do all these different programming languages do? What does Python do? What does Rust do? What does JavaScript, TypeScript, do? Languages that the most amount of people are familiar with?

And I can try to incorporate those ideas into Nextflow. I don’t wanna get too much into like upcoming features in this episode. Maybe in the future you’ll see that, but that’s just another guiding principle I wanted to point out is making code as familiar as possible so that someone who’s coming from a typical Python environment, or R, or whatever language, is more likely to understand off the bat what some code is doing.

And at the same time, that has to be balanced against, the existing system, right? So there may be certain changes that would be nice to do, like maybe changing some keywords or something like that would make the language better. But if it doesn’t provide that much value. And if it would require people to unnecessarily change all these little things throughout their code then we might not do that.

So there is a balance between, improving the language and sort of making it as easy as possible for people to migrate.

To DSL3 or not 2 DSL

Ben Sherman: And so I think the features the examples that we’re gonna go through here is gonna represent the movement from, what I’ll call lenient DSL2 to strict DSL2. Will be somewhat painful for some people, depending on, how you write your code.

But moving forward, once you’ve made that shift into the strict syntax, any future improvements, we’re gonna try to make them as smooth as possible. And now that we have this Nextflow parser we can do that very easily.

I can introduce a new feature while supporting, some previous syntax and gradually phase it out very easily so that you have plenty of time to move things over.

But admittedly, this strict syntax was a bit of a jolt. There, there just wasn’t much we could do to prevent that.

Phil Ewels: So there might be some people in the audience who have been using Nextflow for a few years and remember the last time we had some major syntax updates for Nextflow, which was this migration from what was called DSL1 to DSL2, which is when Nextflow gained its ability to be modular.

That came with a suite of changes to the Nextflow syntax. And there are still some nf-core pipelines going through a transition from DSL1 to DSL2. What, like five years later.

Are we talking about DSL3 here? What is this?

Ben Sherman: No. I don’t really think of it as DSL3. Certainly the, the strict the, strictening, the strictification um, doesn’t even come close.

The change from DSL1 to DSL2 was huge. And it was not just a matter of syntax, like you said, it was about new behaviors. It was enabling modules, moving your code from being all in one script to being split into different modules, moving all the workflow logic from being scattered all over the place to, in an actual workflow block, that’s huge.

And there’s just no way to get around, the massive amount of, transition. I don’t know that we’ll ever have any change on that level. We may have something that comes close. You never really know, but certainly what’s happening with the strict syntax is, it’s a removal of syntax variance, basically, as a way to think about it.

So that’s why I say that the pain involved here really depends on how you write your Nextflow code. If you happen to write your Nextflow code in the way that, we’ve laid out is what we wanna support in the strict syntax, maybe you lucked out, or maybe your pipelines just aren’t super complicated. You’ll probably be fine. There’ll be very few changes.

If you’re one of those like Groovy mavericks who does like custom classes and does all sorts of crazy code in your Nextflow pipeline, your code will still work. There will be a way to, make your code work, but you might have to do a bit more changes. So just how, it’s, how it’ll be. But yeah, certainly nothing compared to what was required DSL2.

Nextflow versioning

Phil Ewels: That’s good. And you’re talking also a little bit about incremental updates and incremental changes to syntax over time. DSL2 came as a, basically a big drop. There was parallel support for a little while, but it was basically pretty much one big release.

How do you see this working with Nextflow versioning?

Ben Sherman: Yeah. I wanna try and get away from the DSL1, DSL2, DSL3 paradigm. Because like you said, that does force you to have these huge changes. And in the meantime, there isn’t really a way to differentiate between minor changes within the DSL version.

For example, in DSL2, we’ve added new language features over the years. Think something like the workflow output definition. Or the the process arity option for inputs and outputs.

There’s no like DSL 2.1 or 2.2 to demarkate those version differences, right? We just say it’s all DSL2. But if you want to use the workflow outputs, you have to use a specific Nextflow version or greater to use that, right?

And so what I’d like to do moving forward is basically tie the Nextflow language version to the Nextflow runtime version, specifically the stable versions. So we put out two Nextflow versions a year. So this last year would’ve been 24,.04, 24.10, I think that’s a good model for us to go forward with. Basically in saying that with every new stable release, that will constitute a new Nextflow version, and that will come with new syntax or syntax removals, deprecations.

One of the things that you’ll see in the language server, or in the VS Code extension. Is that when we release Nextflow 25.04, in VS Code, there’ll be an option called Nextflow Target version. And at that time there’ll be two options, 24.10 and 25.04.

If you want to use new language features in 25.04, the language server’s gonna complain. It’s not gonna recognize it unless you go in there and you update your version to 25.04. Basically tell VS Code: “I’m writing Nextflow code for Nextflow 25.04”, right?

Yeah. So that’s the idea, is we wanna move towards more of a rolling release model.

Something that I really want to start getting across to people is that, upgrading to the next stable version isn’t always a trivial thing, right? I encourage people to take time to do it, but make sure that your pipelines actually work. Make sure that the syntax still works and everything.

And this is partly on us. I mean, One thing that we wanna start doing is having more clear release notes for each stable version. Spell out more clearly, these are all the new syntax changes, rather than you having to like, go through the docs and look at some specific feature to find, some note was like, oh, this syntax was changed in X version. We wanna start having, some clearer migration guides for that.

But the end result of that is that basically, every six months you get a new Nextflow version. That new version will come with new config settings and runtime updates and all that good stuff. But then it will also come with some set of new syntax change, syntax removal syntax, whatever that may be.

And targeting your Nextflow code to a particular stable version or a range of stable versions if you can manage. I think is how we wanna do it moving forward. That’s gonna make it as easy as possible for us to introduce new language features in a piecemeal way rather than these giant changes. And it will make it as easy as possible for users to adopt those changes gradually and at their own pace.

Phil Ewels: I, I was thinking um, we also have the minimum Nextflow version, which is a configuration option. So Nextflow should throw a warning or depending on how you specify it, could also error if you try and run with too old a version of Nextflow. People can hard code that in their pipelines for a bit of safety.

The other thing to mention is that it, for anyone who’s not aware of this, it’s very easy to specify a specific version of Nextflow as well. So it is good practice in your production environments to always very specifically pin the version of Nextflow you want to use with the NXF_VER environment variable.

So you can switch between different versions of Nextflow, just one command to the next, super easily.

Ben Sherman: Right.

Diving into specifics

Phil Ewels: I could feel the audience starting to get a bit worried at this point. What’s coming around the corner?

Ben Sherman: Yeah, so today, I don’t I don’t want to talk about new features. New features will be coming down the line. All the features I’m talking about today are just minor syntax changes, removing, alternative ways to express, the same sort of idea and trying to tighten down the space of allowed syntax.

First thing I’ll note is that basically everything that I’m about to talk about is already documented. So all you have to do is go to the Nextflow Docs. There’s a new page called Updating Nextflow Syntax. So if nothing else, just go read through the page and see what are all the changes.

For every, syntax change. I’ve tried to give, a clear code example in how you can rewrite it. And in most cases it’s quite trivial what you have to do.

But here for this podcast, I just wanted to call out what I think some of the most salient changes would be.

Mixing statements with script declarations

Ben Sherman: The first one is this idea, I call it mixing statements with script declarations.

So things like processes, workflow definitions includes, these are all what I would call script declarations. They are statements that go in the top level of your code. So when you’re looking at an Nextflow script, there’s basically a very small set of things that should be at that top level.

And then there’s this idea of statements, like actual code. Things like declaring a variable, printing something, calling a process or workflow, calling an operator. If statements.

And those things traditionally have been allowed at the top level, so you could have them mixed with your processes and workflows. Actually in DSL1, you had to do it that way, right? Because you had processes and then your channel logic was just mixed in at the same sort of top level code.

And then in DSL2, we moved all that channel logic into workflow definitions. But you are still allowed to have code at the top level. And now with the strict syntax, we’re saying, these things cannot be mixed. Any kind of statement, like if statements, print statements, all that kind of stuff, actual code has to be moved into a process block or a workflow block.

In most cases, it probably goes into your entry workflow. That workflow that doesn’t have a name. Because the most common thing that I see is that people will do like parameter validation, input validation. They’ll do that in the top level before they go into actual workflow. We’ll just move all of that into the workflow.

And the reason for this is, it’s removing the multiple ways to express something. If this is code that is part of your entry workflow, then just put it in your entry workflow, right?

So I think it removes some confusion around where do I put this code? Do I put in the entry workflow or does it go what’s the difference? And now it’s very clear what you have to do.

Suppose you had a script that has an entry workflow but also has some processes and sub workflows and you wanted to include that script in a larger script, you probably don’t want that input parameter validation to be run in that case, right?

You only want that code to be run when you actually use that script is like your entry point. And so that’s why you’d wanna move it into your workflow block.

Python has a similar idea where you can have Python scripts that define a bunch of functions and then also have some kind of a main function.

And then you’ll see a little if name equals main execute this code, right? And that allows you to use a Python module either as a main script or as an import, right? And so this is much the same idea.

Phil Ewels: And there’s also something that I found confusing over time. ‘cause Nextflow is, it’s inherently parallelised and not everything runs in a serial way. So I’ve found it confusing looking at code, I think the developer’s been thinking of it as if it was running in serial, but actually chunks are running all over the place in a different order.

And I think this syntax where everything’s in workflows is much more obvious. It’s more specifically written.

And also, you can call specific workflows with the -entry command line option as well. So then if you’re calling different workflows, you know exactly which code is gonna be executed rather than just arbitrary code, which is sat the top of your file.

Better inspect command

Ben Sherman: There’s also people may have heard of the inspect command, which we introduced a couple versions ago. And the inspect command is used to get a preview of all of your process containers without having to run through your entire pipeline.

But one of the problems that I noticed in the first version of the inspect command that we added, was that a typical nf-core pipeline to use the inspect command, you still had to provide a lot of inputs that you wouldn’t expect.

So like you still had to provide the input file and, other parameters. Otherwise the nf-schema validation would fail. And this is because, a lot of that validation code was in the top level of the script. And so you couldn’t get around running that code.

Now since then, I think a lot of the nf-core pipelines have moved that code into the entry workflow. And we have also updated the inspect command, so that it will load your script, but it won’t run the entry workflow.

It’ll be like as if it imports all of the scripts as if they were a module. And then it just looks at the imports to figure out which processes it needs to look at for the containers.

And so the benefit of that is that it doesn’t take as long ‘cause it doesn’t have to run through the entry workflow. And also all that parameter validation code isn’t run, so you don’t have to specify things like the input param, you can just say, ” nextflow inspect nf-core/rnaseq”. And I think that will actually just work. And so that’s another benefit for people to consider.

Phil Ewels: Another thing is that in the past, when you ran “nextflow inspect”, it executed logic that we were going to run. So if you chose to use one aligner, it would give you the containers for that aligner, but it wouldn’t give you the other ones.

Whereas now, correct me if I’m wrong, I think when you run Nextflow, inspect, it gives you every container in the pipeline, irrespective of any pipeline logic or anything.

Custom Classes

Ben Sherman: So the the next change I wanted to bring up is what I call custom classes. I don’t know how common it is. Certainly among power users, I’d say it’s more common. People like to declare, java groovy classes in their Nextflow scripts for things. Sort of like a, a poor man’s record type, instead of having the meta map, having an actual class with specific fields and types, so that it’s just easier to check your code and know what fields are available to you.

And so we removed class declarations from the parser. Of course, it’s sort of an accident. So a lot of the syntax that we’re removing was part of the Groovy language. And so like by default was allowed in Nextflow, but wasn’t necessarily intended as part of the Nextflow language.

And people are creative. They find all these neat little tricks they can do in Groovy and it works for them. And that’s fine. This is one of the things that we have to tighten down a bit.

So with class declarations basically you’ll have to move them into the lib directory for now.

But this is one case where I wanna point out we do want to add new syntax to support something like class declarations, but we believe we can do it in a much nicer and cleaner way than what you’d have to do with Groovy. Because with Groovy, you have to declare the class. Let’s have all these fields, which is fine, it’s simple enough, but then you have to add all these annotations to make it work.

And so we don’t wanna deal with all that, it should be a very simple syntax, just like record name, and then the fields, you shouldn’t have to add all this extra groovy boiler plate. But in order to do that, there’s this sort of transition period where we have to take out the classes, and then later on, hopefully this year, add in, an alternative syntax. Then that becomes the way forward.

So I’ll talk more about the lib directory in a second, but basically the TLDR: just move that code into the lib directory or into a plugin or something,

Phil Ewels: Was it Fulcrum who gave a talk about that at the Boston Summit?

Ben Sherman: Yeah, about a year ago, I believe that was Jason Fan. And that was a great showcase that he gave of like how to use custom classes in Nextflow. And that’s the exact kind of use case that we wanna support. We just wanna do it in sort of a Nextflow idiomatic way, right? So that’s the plan.

I’m really looking forward to having that functionality because that’s gonna enable so much stuff when you can start having record types and proper checking and all of that.

For loops, while and switch

Ben Sherman: Next thing I wanna mention, here’s another sort of Java groovy syntax. Is things like for loops while loops and switch case.

So these things are allowed in Groovy and Java. The thing about loops is that there are a lot of places in Nextflow where you might think you need to use a loop, but in actuality there is some other way to do it. Probably the most common example would be something like a map operator or flatmap operator. Those operators are essentially implementing a loop, right? But you’re not writing it as a loop. You’re writing it in a more declarative way as a series of transformations.

And so part of why I’m getting rid of it is to help people from, falling into that hole of writing a loop when they shouldn’t be doing it.

While loop is an interesting case. I could see us adding the while loop back because it might be useful to do things like iterating based on some condition rather than iterating over a collection, which is what something like Map or Flat Map does. But again, we’ll just have to see what kind of use cases come up.

I really wanna make sure that there’s a clear use case for using some kind of syntax. If there’s already some other way to do it, if you can do it with an operator and that’s the idiomatic way to do it, then I wanna encourage that.

Switch case . I think I would like to have some kind of pattern matching syntax, like what Switch Case does. But again, it’s a question of can we do it in a better way? That’s not optimized for whatever Groovy and Java were doing, but it’s optimized for what makes sense in Nextflow.

Phil Ewels: I feel like people who do much reviewing in the nf-core world, it’s quite a common thing to find yourself saying to people, this shouldn’t be a for loop. This should be a map operator. So I think it will actually help in that sense of reviewing code because people won’t be able to get that far. But I think, like you say, quite often people think they understand what it’s doing, but because of the way that Nextflow has this kind of push model and works a bit differently to other languages, it’s can cause these edge cases and weird kind of errors and bugs.

Ben Sherman: And it’s just more declarative. Java was designed in many ways to appeal to C Plus Plus programmers, C Plus Plus being based on C, which is a very sort of thin layer over assembly. We’re basically just carrying over these syntax patterns that have been around for decades and decades.

Now there are plenty of things that were figured out 60 years ago that still work just great today. But I think loops are an example where I. We have a much better paradigm in the form of operators.

Standard library

Ben Sherman: Another, call out, I’ll make to the docs, there’s a page in the docs called Standard Library.

That page was recently expanded to include docs for things like lists, maps, strings. I encourage you guys to go look at some of the methods there because there are a lot of really nice methods you can use for operating on lists where instead of iterating over a list, you might use the collect method, or the each method or something like that. And it gives you a very similar declarative syntax that you the same kind of syntax you would use with an operator.

Check those out. And hopefully you’ll you should be able to find a way to convert any old for loops or while loops that way.

But if you do have some pattern that you don’t know how to translate you’re more than welcome to ping me on slack. I always love those little puzzles. I can either help you find a solution or I can add something to the language to make it work, right? We’ve got all the options around the table.

Type annotations

Ben Sherman: Next thing I wanna bring up is type annotations. So again, this is something where, you know, groovy and Java have type annotations, so by default you can use them in Nextflow. And they are genuinely quite useful even in the current syntax, even though there’s no real like type checking, there is a sort of runtime type checking, right?

So you can write a function in Nextflow and you can add parameters to it, and you can add types of those parameters. Say this parameter is a boolean and this parameter is a map. At runtime when Nextflow tries to call that function. If the values that you give to it happen to not match have to be the wrong type or something, it will actually give you an error. In, in most cases. In some cases it will do like weird type casting. Hopefully something we can get away from.

With the strict syntax, the groovy style type annotations technically are not allowed. I’m putting that in quotes, but the language server has a little wink, nudge, nudge.

Ben Sherman: It’ll make a little deal with you and basically not report warnings for those because they’re just so useful. Because basically our plan is that we do want to introduce type annotations into the language. But again, just like with record classes or with Switch case, we wanna do it with a syntax that makes the most sense for Nextflow.

And so it’s probably not gonna be exactly the same as what Groovy allowed. And so, I don’t want people to just go now and remove all their type annotations and then just add ‘em all back six months later.

So for now I’m saying, just leave them we won’t complain about them. And then when we do add, our preferred type annotation syntax, we will support both for a time and we’ll provide , a stupid, easy way to migrate it. Probably with the auto formatter.

Phil Ewels: Scripts to update and fix format automatically are always a developer’s best friend, so I’m happy to hear that.

`addParams`

Ben Sherman: Next point I wanted to mention. I’ll admit I think I underestimated how significant of a issue this would be. ‘cause I didn’t think people were still using this pattern anymore. The addParams clause on an include.

So when you include a module, you can append to that include statement. There’s an addParams and there’s a params clause. You can say addParams, and then you can specify, some parameter bindings. Then basically it will, import those parameter bindings into the module as if you had defined them there.

And so it gives you some flexibility of overriding param defaults. Maybe you define some arams in the module and then when you import it you wanna overwrite those or whatever.

I think this was initially when we were developing DSL2, or really when Paolo, this was before my time. DSL2 was still experimental. I believe this was one of the ways that people thought that we would, pass params into modules and like control module behavior.

And it ended up being a mess. It’s it’s hard to reason about. And I think at least in nf-core, everybody converged more around having sub workflows and passing inputs into the sub workflow. And that’s basically the pattern that I want to go with as well, with the strict syntax is focusing on.

If you wanna pass a value into a module, well then just pass it as an input, right? When you call the process or call the workflow, just like you would expect to do with a regular function or with any other programming language. Make an input and input, right? Don’t provide all of these extra side channels into providing inputs.

That might give you more flexibility when you’re writing code and designing code. But A, that can cause you to, write code in a way that’s harder to understand. And B, even if you do use it in a sort of a reasonable way, that’s one more syntax variant that people have to learn.

Ben Sherman: When I’m looking at a module now, and I see it’s defining these params, I don’t really know what those parameters are gonna be. ‘cause I don’t know how the importing module is gonna use it. Like I said I didn’t expect this would be an issue. But I have seen a couple of the pipelines, even some very prominent pipelines that use it. And so I apologize, we just hoisted this on you guys, we probably could have done a better job of, providing guidance.

I have since updated the docs. So if you go to the updating syntax page that we mentioned, there is a section now on the include addParams and how to modify it. But the basic idea is just replace the addParams with workflow inputs.

Phil Ewels: It always threw me ‘cause it’s global variables in a way, and you’re looking at the module code and you’ve got no idea that this is gonna happen. And then if you import the same module in different places, it can be done in different ways. And it’s you have to really manually track through all your workflow logic to see that pattern.

Ben Sherman: Yeah, the params, I think is one of several examples of a syntax that in DSL1 was fine, right? Like when all your codes in one script, it’s fine to just have a global variable called params that you can throw things into and pull things out of as you please, right?

But once you move into DSL2 and you’ve got modules everywhere, that paradigm I think just didn’t really carry over the right way.

Some of the ideas we have cooking now for how to do params properly, quote unquote is really, I think a fulfillment of what would’ve been the right way to do params in DSL2. Rethinking that.

There’s maybe a little bit of transition pain, but hopefully on the other side of that, your code becomes a lot easier to read and reason about.

Process script section

Ben Sherman: Last point, I’ll make on this, laundry list of syntax changes is a very minor thing is the the process script section.

So when you have a script in your process body, you have that script colon label. In lenient DSL2, what I call it in many cases that script label can be omitted. With the strict syntax, it basically always has to be there unless the script is literally the only thing in your process. So basically the moment that you have inputs, outputs, directives, you need to have that script label.

I only mention it because I think it was just a common pattern for that script label to be omitted. So you’ll probably see this error come up a lot. It’s an easy thing to fix. I just wanted to call it out.

The main reason I wanted to do that is because that actually makes it easier for the Nextflow parser to detect errors. Because if you looked into, some of the lenient DSL2 code, you would actually see when Nextflow was reading the inputs and outputs and trying to determine if it was valid syntax.

It was a bit hamstrung because if I encounter an output statement and I think it’s like the wrong syntax or something is it really an incorrect output or is it just the first statement in the script section? Because the script section, the label isn’t required to be there. I as the Nextflow parser, I have no idea which one it might be. And so I can’t report an error because I could be wrong.

Ben Sherman: And so this is just an example of where requiring the user to be just a little bit more explicit allows us to be a lot more confident with error checking. We can provide deeper level of validation. So hopefully that’s a simple enough thing for people.

Phil Ewels: Can we fix that with the auto formatter or is that’s a manual change.

Ben Sherman: I think that’s something in the future that we could do with auto formatting. I haven’t implemented any auto-fix rules with the formatter yet. I just need to implement one, if I can implement one auto-fix rule and figure out all the machinery required to make that work.

Moving into `lib/`

Ben Sherman:

A general point I wanna make about dealing with scripts in particular, and I’ve hinted at this with things like custom classes and loops. If you find that you have some code that you just can’t figure out how to make it work with the strict syntax, or maybe you just don’t want to yet. Maybe it’s just not a high priority for you right now, which is fair. Another option that you have is to move that code into the lib directory.

Ben Sherman: You can read about it in the docs, basically top level of your pipeline, you can have a directory called lib. You can put groovy scripts. You can also put like java jars in there. But anyway, you can have arbitrary groovy code in there.

All you have to do is take the problematic code that you have, wrap it in some kind of helper function, and then move that helper function into the groovy code, into the lib directory.

And then in your Nextflow code, all you have to do is just call that helper function. And you can basically apply that for any problematic code that you have. And then the other option you have is you can also move that code into a plugin. Plugins very analogous to the lib directory, but the advantage is, of course, that plugin, you can put that code in a plugin once, and then you can reuse it across all of your pipelines, right? Whereas with the lib directory, you have to copy that code for every single pipeline.

Phil Ewels: Did you say that you can have functions at the top level of the workflow script still? So could you move that custom validation code? You could just indent that, stick that in a function, and then call that function within the workflow.

Ben Sherman: Sure you could do that in either case, and that might be a good practice just to clean up your code, right?

Phil Ewels: So then you don’t even have to move the code out of the file. It’s a very small syntax change.

Ben Sherman: Sure. I was mainly talking about stuff like custom classes or if you’re using a while loop and you can’t figure out how to rewrite it, moving it into a function in the Nextflow script isn’t gonna be enough.

Phil Ewels: Because the syntax is still invalid.

Ben Sherman: it. So at that point you have to actually, yeah, you have to actually move that function into the lib directory or a plugin.

And we’ve got some nice improvements. I know people have been skittish in the past about writing a plugin. I think just pay attention to our Boston Summit coming up. I think you’ll feel a lot more excited about writing plugins maybe after that. That’s all I’ll say for now.

Configuration syntax

Ben Sherman: Moving on to configuration syntax. This is probably one of the more, sources of pain that I’ve seen from users responding to the strict syntax because the config syntax with the strict config syntax is considerably more strict. Not because we’ve really removed lots of functionality, but just because the Nextflow config was just way more powerful than it probably should have been. And we’ve, we’re finally tamping that down.

But basically Nextflow config in the past, just like a Nextflow script. It can be an arbitrary, groovy script. You can put loops and classes and functions and if statements and try catch and all kinds of stuff in there.

Phil Ewels: One of my favorites is UPPMAX configuration. We actually ran system commands within the config to find out which system we were running on. We did hostname and then like, just, just shelling out to the shell and we shouldn’t really be able to do this in a config file, but it works.

Ben Sherman: Ironically you probably will still be able to do that. You’ll just have to jump through some more hoops to do it. But yeah, so we basically just feel like configuration next to a configuration should just be configuration. Like it shouldn’t be super complicated.

I think one of the reasons why we have a custom config language rather than just using JSON or YAML or TOML is because it is actually useful to have some level of dynamism that you get from having Groovy expressions.

And so the basic paradigm of the strict config syntax is that you can define config settings. You have config blocks, config includes, all the basics there.

But when you assign a config setting to a value. That value, it can be whatever you want, right? It could be a number string Boolean, what you might typically see in a YAML file, but it could also be a groovy expression. It could be, a dynamic string. It could even be and this is the crazy thing that I was the loop I was talking about.

So you can have a closure which returns a value, right? You typically might use this, think like what you use for a map operator. Like you have a closure. Then in the config, you can have a closure and then you can call that closure immediately. So you can define the closure, and then you can just add left, right parentheses. So create it, invoke it right on the spot. You might have to wrap that whole thing in parentheses to make it work.

So you actually still can do that. When you have the include config, when you provide that string for the config path that string can also be any expression. So you could make that a closure that calls itself. And then within that closure you could run those system commands you were talking about, to say which environment am I in? And then based on that point to a different config file. And you might have a different config file for each one of your environments.

Honestly, a lot of the dynamism is still there. It’s just restricted. It’s in a very purely declarative way, right? You can’t have arbitrary code at the top level, but when you’re in the land of assigning a value, the value can be whatever you want. And it’s basically, open season in there.

So I feel pretty good that people will, they’ll find ways and workarounds within the stricter syntax.

Phil Ewels: We’ve had quite a lot of conversations in the past about whether the Nextflow config should just be yaml. And I think as we’ve gone through this process of simplifying a config language and finding all these things that we’re saying you’re not allowed to do, now you see the variety of things that people want to do in their config scripts.

And that’s the explanation for why we haven’t gone the full way all the way to just trimming it down to pure yaml. That would force too many people to remove functionality, which would be impossible to do otherwise.

Ben Sherman: Yeah, we’ve really seen the value of all these crazy things that people do. It is genuinely useful to be able to do a lot, have a lot of this dynamism in the config file. And so I’ve tried to again, find a proper balance between, how do we get, as much of the dynamism as possible that’s useful to us. While still having a simple syntax.

And so the advantage of, requiring the config syntax to be very strictly just like config blocks, config assignments, config includes, and not having arbitrary code at that level is that now again, we can do error checking much more confidently.

We can, verify you didn’t like forget the equal sign. Or we can verify that like the includes all make sense where you put them. And and that all the options that you’re using are like actually valid options that exist and not just like a typo or something, right?

So we’re able to solve a lot of the problems that people have had when they make some syntax mistake in the config.

Relieving pressure from the config

Ben Sherman: Another aspect of making the config easier to use and read and understand is just like trying to relieve some of the pressure off of it. I feel like people have been using the config for a lot of different things that start to stretch the boundaries of the purpose of configuration.

One prong of this effort is to, simplify the syntax as much as possible and improve the error checking. Another prong is finding ways to relieve that pressure. One example that’s already been out for a while is the resource limits directive. So we added that because nf-core pipelines were defining this whole function to apply resource limits to things like CPUs and memory. So we just turned that into a native Nextflow feature, right? So that you don’t have to be writing arbitrary code in the config. It’s nice that you were able to do that, fill a gap there. But that highlighted for us something that we just needed to be in Nextflow.

Another example is publishDir, which nf-core typically has this giant modules config with all of the publishDir statements in one place. And the publishDir directives themselves are like super verbose. You’re having to repeat a lot of the same settings. Simplifying the config syntax doesn’t really help with that. If anything, it just makes it more difficult, right? But what we’re doing there is, that’s why we’re replacing that with the workflow output definition. Because now all of that stuff is being defined in the script, which is where it belongs, right? If you have all these config settings where you’re having to have closures all over the place, that’s a good indicator that those settings probably need to be defined in the script somewhere.

‘cause what you’re saying when you have a closure is that I don’t have the context yet to evaluate this setting, but I will later. So when that time comes, evaluate it, then. We could just move all those settings into the script so that it’s evaluated at the proper time. We just needed to find a way of how to model it correctly, right?

So once we finalize the workflow output definition, and people start moving over to it, you’ll start to see a lot of that kind of config goes away because you just don’t need to put it in the config anymore. And then it just becomes a lot simpler.

Phil Ewels: I think of those, when you see a lot of closures like that, it’s a bad code smell, it’s like there’s something fundamentally wrong here, so yeah. A bit of code hygiene by bringing in the workflow output definition.

Timelines

Phil Ewels: So how soon can we expect to see some of these things rolling out?

Ben Sherman: I’m still mulling this over. I can see two extremes in my head. Basically the idea is, so 25.04 upcoming stable. We’re gonna introduce the strict syntax as an opt-in feature, right? So you’ll have to enable some environment variable to use it.

And then we’ll do the same process as we did for DSL2. At some point, the strict syntax will become the default, but still opt out. And then at some point it’ll become the only one. How quickly or slowly we do that, I think depends on how well the community receives it.

I think the fastest possible timeline is where we make it opt-in in 25.04, and then at 25.10 it becomes default, and then a year from now, 26.04, it becomes mandatory. That’s pretty quick. But considering that this is a smaller change than going to DSL2, I could see us doing that if there aren’t major snags, that we encounter.

The slower alternative is basically taking twice as long. So a year from now, it becomes opt out, and then a year after that, in 27.04, it becomes mandatory. We just have to see, how easily people are able to migrate, how much we’re able to implement via auto fix maybe.

The gain for the pain

Phil Ewels: What kind of features do people get? What’s the payback for going through this process of updating their code? We’ve mentioned the auto formatter. Does that work with the previous syntax?

Ben Sherman: So the auto formatter requires the strict syntax right now. What I’ve been mulling over during this episode is maybe having some sort of auto-fix functionality so that it can, pick up some of those old patterns and bring them into the strict syntax. That, that should be possible in some cases, like I said, like with the script label should be pretty straightforward.

But the problem is that there’s always edge cases that you have to think about and we just have to make sure that we can do it in a robust way. We’ll see how well we can do that.

Aside from just the better error checking, which I hope is a reward in and of itself. We’ve got a whole slew of language improvements. Things that you and I have been talking about on this podcast for years now that we want to implement. And in many cases, those will only be possible to do in the strict syntax, in part because of, like I said, the benefit of tightening down the syntax.

It’s fewer syntax variants you have to learn. It’s also fewer variants that the compiler has to think about, things like moving statements into your entry workflow, that kind of stuff makes it easier for us to add new features to the compiler and not worry about, all kinds of weird edge cases.

So things like I mentioned type annotations, record types, workflow outputs, maybe workflow inputs. All those kinds of things are gonna be up for grabs. What a lot of it is leading to really is what I would call static type checking, where it’s not only, checking things like variable names and things of that sort, but actually checking every value as you pass values from function to function, from operator to operator, that all those values are the correct type. Kinda like what you might get from, pydantic in Python.

Because if you go and look at a page like the Nextflow gotchas it’s a nice repository of like weird Nextflow errors that people run into. Probably 90% of those errors come down to type checking. Meaning that if we can have type checking in the language, a lot of those errors go away or they get bubbled up much quicker and in a much easier to understand sort of way.

Once you have type checking it, you really get close to the point of saying if your code compiles. It’ll run correctly.

And my favorite part of it, is that we’ll be able to do all of that without having all of the complexity and verbosity of a language like Java or C Plus Plus, where you have to have so much extra annotations. You really have to be like a professional software engineer to write good code in those languages. But here because of, what we’re doing with Nextflow and the domain specific nature of it, I think we can have a really nice solution that’s both easy to write and easy to read and has, has all the robustness of a proper programming language.

Next steps

Phil Ewels: So people listening today what’s the next step for them? What do I have to do? Where? Where should they go?

Ben Sherman: So step one I would say is just use the language server. You don’t have to use VS code. Like I said, people have figured out how to do it on Neo Vim and emax and any editor that supports the language server protocol can use it. If you have some new editor. We would love for, people to contribute new integrations there. Just use the language server and if you fix all the errors that it complains about, you’ll be good to go. You’ll be set.

Aside from that, go to the Nextflow docs. Go to that page on updating Nextflow syntax, and just give it a scan. I try to give very clear examples of this is what it looks like, change it to this and it should work. And same thing. If you can go through that page and say, I’m not doing any of these anymore, then you’ll be good to go.

Conclusion

Phil Ewels: Alright, Ben, thank you very much. This has been educational as always. Honestly I think we’re both a bit scared about the reaction that people are gonna have when suddenly they see all these new errors. But honestly, I think it’s gonna be a rule step in the right direction, right?

Ben Sherman: I really hope so. I’m active on Slack. I listen to people whenever they bring their issues, so please don’t hesitate to to post on Slack or the community forum if you have some problem. I try to incorporate as much community feedback as I can into, syntax changes, new language features.

So if you have an issue, let’s just have a conversation and we’ll try to find something that works.

Phil Ewels: All right, Ben. Thanks very much. We’ll see you soon at the Boston Summit, I guess in May.

Ben Sherman: Yep. I’ll be there. If anybody wants to come chat.

Phil Ewels: Yeah, coming up soon now, and always for the Nextflow talk to look forward to as well see if any goodies are revealed. uh,

Ben Sherman: We might have a few.

Phil Ewels: We’ll have you back on the podcast again soon and we’ll go into some of these new features that we’ve touched on and talk about ‘em in a bit more detail.

Ben Sherman: All right. Talk to you soon, Phil.

Phil Ewels: Thanks very much Ben, and thanks everyone for listening.

Back to podcasts