Paolo Di TommasoPaolo Di Tommaso
Nov 19, 2024

Enhancing security in data pipelines with Nextflow and Wave

Security is a critical topic when deploying data pipelines in regulated environments. The latest versions of Nextflow and Wave introduce two powerful new functionalities that greatly simplify the handling of containers and the management of security when deploying containerized workloads at scale.

Support for container mirroring

Nextflow 24.10.0 (and later) offers built-in support for Wave container mirroring. This feature enables you to copy containers used by your pipelines to a container registry of your choice ahead of the pipeline execution, so that containers are pulled from the registry you have provided rather than the original registry.

Container mirroring is particularly useful for creating an on-demand cache of container images co-located within the same cloud region as your pipeline execution, optimizing network transfer, and reducing container launch time.

This capability can also be relevant to streamline the synchronization of container registries across different cloud regions and vendors to comply with the requirements of security and regulatory authorities.

To enable this capability add the following snippet in your Nextflow pipeline configuration:

wave.enabled = true wave.mirror = true wave.build.repository = '<YOUR REGISTRY>' tower.accessToken = '<YOUR ACCESS TOKEN>'

In the above snippet, replace <YOUR REGISTRY> with a container registry of your choice. For example, quay.io (no prefix or suffix is needed). The container will be copied with the same name, tag, and checksum in the specified registry. For example, if the source container is docker.io/biocontainers/bwa:0.7.13--1 and the build repository setting is foo.com, the resulting container name is foo.com/biocontainers/bwa:0.7.13--1.

Enhanced container security scanning

Deploying data pipelines in regulated environments requires constant monitoring of the security of your applications and related components. Given the multitude of containers utilized in modern analysis pipelines, this task may not be straightforward.

The latest version of Nextflow (24.10.0) introduces a new capability for scanning for security vulnerabilities. This feature, powered by Wave, allows for the transparent and automatic scanning of any container used in your data workflow. When enabling this feature, Nextflow scans for security vulnerabilities in the containers in your pipeline on-demand, before it's used in a task of your pipeline, and reports an execution error if any vulnerability is found. This system therefore prevents the introduction of affected containers in your data processing lifecycle in a completely automated manner, guaranteeing against the use of unsafe software components.

To enable this capability add the following settings to your Nextflow configuration file:

wave.enabled = true wave.scan.mode = 'required' tower.accessToken = '<YOUR ACCESS TOKEN>'

With these settings, Nextflow will only permit the use of containers that are free from security vulnerabilities. You can also define the acceptable levels of vulnerabilities using wave.scan.allowedLevels. For example:

wave.scan.allowedLevels = 'low,medium'
💡Hint: The above setting will allow the use of containers with low and medium vulnerabilities. Accepted values are low, medium, high, and critical.

Importantly, it is worth noting that the security scans automatically expire after one week. If a container is accessed again after seven days or more, the scan will be re-executed. This ensures your container security remains up-to-date without the need for complex and expensive procedures or automations.

Conclusion

Wave provides a single service to handle the full lifecycle of containers in your data pipelines transparently and automatically. The latest version of Nextflow enhances this by providing built-in support for both container mirroring and security scanning. Together, Wave and Nextflow greatly simplify the security management of containers in regulated environments.

Visit Nextflow documentation to learn more how to enable container mirroring and security scanning in your pipelines.