Evan Floden
Evan FlodenJul 01, 2020

Tower Forge for AWS Batch

AWS Batch is a fantastic service which provides virtually unlimited computing capacity and manages all the complexity of provisioning VMs and storage for the execution of containerized tasks.

Nextflow users love it, and we have seen a rapid and growing adoption across the community. At the same time, recurrent troubles arise from users when getting started. The setup of the Batch environment along with the requirements for Nextflow can make the configuration tricky, even for experienced users.

Besides numerous excellent tutorials and blog posts that have been published in recent months, this continues to be a pain point for many users.

As the creators of Nextflow, this has been frustrating since the idea behind the project is to empower pipeline developers by allowing them to deploy and scale their applications while hiding the unnecessary complexity of the target execution platform.

Today at Seqera we are very proud to announce another major step towards streamlining the execution of Nextflow pipelines with the introduction of Tower Forge for AWS Batch.

This new feature provided by Nextflow Tower greatly simplifies the setup of the AWS Batch environment. Users only need to specify the most basic information i.e. how many CPUs, which region and the storage they wish to use for their pipelines.

Given this information, Tower Forge sets up the complete AWS compute environment along with all the related permissions and resources required for the optimal deployment of Nextflow pipelines in the cloud.

This also means users no longer need to create a custom AMI or use complex CloudFormation templates to configure their AWS Batch queues and compute environment 🎉!

As of today Tower Forge supports the setup of AWS Batch configuration in any AWS region, using your default VPC network configuration, including the automatic provisioning of FSx for Lustre file system for your Nextflow pipelines.

In the following weeks, we’ll roll out the support for GPU enabled environments, the use for specific instance types, custom VPCs, subnets and security groups.

Pipeline actions!

One thing that makes Nextflow different from other workflow managers is the built-in support for the Git source code management system. The idea behind is to allow the handling of complex pipeline projects. Pipelines may be composed by many assets (source code scripts, deployment settings, dependency descriptors such as Conda or Docker files, etc.) and as a single project they can be precisely tracked and deployed specifying a Git tag or commit id.

This, along with the support for containerization, is the key for enabling replicable pipeline executions and provides the ability to continuously test and validate your pipeline as code evolves over time.

Now we extend this idea with Pipeline Actions, a new feature for the automated deployment of pipelines based on events.

The Pipeline Action feature allows you to associate a Nextflow pipeline project hosted in a GitHub repository to a Tower Launch deployment environment, in such a way that every time a code change is pushed to the Git repository, the pipeline execution is launched by Nextflow Tower.

If this sounds similar to GitHub Actions or Travis, well it is! :D

However such systems were designed to manage short-lived software builds (minutes) with low compute resource requirements. Instead, Pipeline Actions for Tower allows you to scale as you need thanks to the seamless integration with AWS Batch and Google Cloud LifeSciences. No more messy YAML deployment files!

Tower launch hook

There are however situations where users may wish to trigger pipeline executions independently from any change to pipeline code. When creating a Pipeline Action, users can also select Tower launch hook as the event source. When doing so, Tower creates a custom endpoint URL which can be used to trigger the execution of your pipeline programmatically from any script or even other web services!

Along with the request, you can provide any custom parameter defined in your pipeline script:

curl -H "Content-Type: application/json" \
     -H "Authorization: Basic <YOUR-API-TOKEN>" \
     https://staging-tower.xyz/api/actions/qL5zWk35kzyp0Ms29I6wq/launch \
     --data '{"params":{"foo":"Hello world"}}'

Conclusion

We are very excited for today announcements as they introduce a significant step forward towards more reliable, replicable and scalable data analysis pipeline and establish Nextflow Tower and the fastest and foremost way to deploy Nextflow pipelines in the cloud.

If your organization is interested in deploying Tower in your cloud or on-prem environment, please reach out to us at info@seqera.io, and we would be happy to discuss your requirements.