Marco De La Pierre
Marco De La PierreApr 18, 2024

Singularity Reloaded

Containers are essential components in reproducible scientific workflows. They enable applications to be easily packaged and distributed along with dependencies, making them portable across operating systems, runtimes, and clouds.

While Docker is the most popular container runtime and file format, Singularity (and now Apptainer) have emerged as preferred solutions in HPC settings.1 For HPC users, Singularity provides several advantages:

  • Containers run under a Linux user’s UID, avoiding security concerns and simplifying file system access in multi-user environments.
  • Singularity Image Format (SIF) containers are stored as individual files, making them portable across cluster nodes, easy to manage, and fast to load.
  • Containers work seamlessly with workload managers such as Slurm or Spectrum LSF, running under the workload manager’s control rather than as a child of the Docker daemon.

This article explains how Nextflow and Wave are evolving to meet the needs of HPC users, supporting new capabilities in both Singularity and Apptainer. Read on to learn more!

The Open Container Initiative

While Singularity offers advantages for HPC users, Docker is the de facto standard. Singularity and Docker have different approaches to storing container images on disk. While Singularity uses its native SIF format, Docker stores images as layers along with metadata in a Docker image cache where each image comprises many physical files.

As early as 2015, it was clear that multiple file formats were going to be a problem. The Open Container Project (OCP), now known as the Open Container Initiative (OCI), was formed with the backing of major players, including Amazon, IBM, Google, and Microsoft. The OCI aims to provide standardization and ensure interoperability among container solutions by promoting standard image formats and runtimes. The OCI embraced Docker’s file format, and Docker contributed its runtime (runc) as the reference OCI runtime.

Today, Singularity supports Docker’s layered image format in addition to its own native SIF format. Typically, when Singularity users pull a container from a Docker registry, Singularity pulls the slices (or layers) that make up the Docker image and automatically converts them into a single SIF image. This approach works — however it is slow, and compatibility issues sometimes arise.

Given that HPC users increasingly need to run both Singularity and Docker/OCI containers, Singularity has been executing a roadmap to bridge the gap between HPC and OCI, evolving the native SIF format and runtime to support SIF-encapsulated OCI images.

Singularity vs. Apptainer

There is often confusion between Singularity and Apptainer, so it is worth providing a brief explanation. When Sylabs forked the Singularity project from the HPCng repository in May of 2021, they chose not to rename their fork. As a result, the name “Singularity” described both the original open-source project and Sylabs’ new version underpinning their commercial offerings.

To avoid confusion, members of the original Singularity project moved their project to the Linux Foundation in November 2021, and renamed it “Apptainer.” As a result of these moves, Singularity has diverged. SingularityCE and SingularityPro are maintained by Sylabs, and open-source Apptainer is available from apptainer.org with available commercial support.

Nextflow and Seqera fully support both Singularity dialects, treating Singularity and Apptainer as distinct offerings reflecting their unique and evolving features.

Nextflow support for Singularity and Apptainer

Nextflow can pull containers in different formats from multiple sources, including Singularity Hub, Singularity Library, or Docker/OCI-compatible registries such as Docker Hub, Quay.io, or Amazon ECR.2 In HPC environments, Nextflow users can also point to existing SIF format images that reside on a shared file system.

For Nextflow users in HPC environments, a common usage pattern has been to have Nextflow download and convert OCI/Docker images to SIF format on the fly. For this to work, scratch storage needs to be available on the cluster node running the Nextflow head job to facilitate downloading the container’s OCI blob layers and assembling the SIF file. The resulting SIF file IS then stored on a shared file system accessible to other cluster nodes. While this works, there are problems with this approach:

  • Having the Nextflow head node responsible for downloading and converting multiple images presents a bottleneck that affects performance.
  • In production environments, pointing SINGULARITY_TMPDIR to fast local storage is a standard practice for speeding the generation of SIF format images, but this adds configuration complexity in clustered environments.

A better approach using Nextflow ociAutoPull

As of version 23.12.0-edge, Nextflow provides a new ociAutoPull option for both Singularity and Apptainer that delegates the conversion of OCI-compliant images to Singularity format to the container runtime itself.3

This approach has several advantages over the previous approach:

  • The pull and conversion phase of generating SIF files from OCI images is managed by the container runtime instead of by Nextflow.
  • The pull and conversion happen on compute nodes instead of the node running the head job, thus freeing up the head node and enabling conversions to execute in parallel.
  • Images are cached on the compute nodes with the OCI layers intact. Assuming images are cached on a shared file system, when two containers share the same base images, only one copy needs to be retained. This avoids the need for unnecessary downloads and processing.4 The example below illustrates how this works in practice:
singularity.enabled = true
singularity.ociAutoPull = true
process.container = 'ubuntu:latest'
$ nextflow run hello -c <above-config-file>

If you are using Apptainer, replace the scope singularity with apptainer in the Nextflow config example above.

Running OCI format containers

Apptainer now supports multiple image formats including Singularity SIF files, SquashFS files, and Docker/OCI containers hosted on an OCI registry. As of SingularityCE 4.0, Sylabs introduced a new SIF image format that directly encapsulates OCI containers. They also introduced a new OCI mode enabled by the --oci command line switch or by adding the oci mode directive to the singularity.conf file.

When OCI mode is enabled, Singularity uses a new low-level runtime to achieve OCI compatibility.5 This is a major step forward, allowing Singularity to execute OCI-compliant container images directly, solving previous compatibility issues. For Singularity users, this new runtime and direct support for OCI container images make it much more efficient to run OCI containers.

In Nextflow, this functionality can be enabled as follows:

singularity.enabled = true
singularity.ociMode = true
process.container = 'ubuntu:latest'
$ nextflow run hello -c <above-config-file>

Wave support for Singularity

In addition to the feature above, Nextflow provides better support for Singularity and Wave containers.

Wave is a container provisioning service that, among other things, allows for the on-demand assembly of containers based on the dependencies of the jobs in your data analysis workflows.

Nextflow, along with Wave, allows you to build Singularity native images by using the Conda packages declared in your Nextflow configuration file. Singularity container images are stored in an OCI-compliant registry and pulled on demand by your pipeline.

To enable this capability, you will need to add the following settings to your nextflow.config. In our example, these settings were stored in wave-singularity.config.

singularity.enabled = true
singularity.autoMounts = true
singularity.ociAutoPull = true

wave.enabled = true
wave.freeze = true
wave.build.repository = 'docker.io/<user>/wavebuild'
wave.build.cacheRepository = 'docker.io/<user>/wave-cache'

tower.accessToken = '<my-access-token>'
tower.workspaceId = '<my-workspace-id>'

wave.strategy = ['conda']
conda.channels = 'seqera,conda-forge,bioconda,defaults'

You can test this configuration using the command below. In this example. Nextflow invokes Wave to build Singularity containers on the fly and freezes them to a repository using credentials stored in the Seqera Platform.

Nextflow requires that the accessToken and workspaceId for the Seqera workspace containing the registry credentials be supplied in the nextflow.config file (above) so that the containers can be persisted in the user’s preferred registry.

The personal authorization token (tower.accessToken) required to access the Seqera API can be generated in the user menu under Your Tokens from within the Seqera web interface. See the Seqera documentation for instructions on how to create a Docker Hub personal access token (PAT) and store it as a credential in your organization workspace.

$ nextflow run rnaseq-nf -c ./wave-singularity.config

 N E X T F L O W   ~  version 24.02.0-edge

 ┃ Launching `https://github.com/nextflow-io/rnaseq-nf` [serene_montalcini] DSL2 - revision: 8253a586cc [master]

 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /home/ubuntu/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /home/ubuntu/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_gut_{1,2}.fq
 outdir       : results

executor >  local (4)
[1f/af2ca7] RNA…ggal_1_48850000_49020000) | 1 of 1 ✔
[d0/afbc55] RNA…STQC (FASTQC on ggal_gut) | 1 of 1 ✔
[b0/f9587a] RNASEQ:QUANT (ggal_gut)       | 1 of 1 ✔
[f0/093b45] MULTIQC                       | 1 of 1 ✔

Done! Open the following report in your browser --> results/multiqc_report.htm

You can use the nextflow inspect command to view the path to the containers built and pushed to the repo by wave as follows:

$ nextflow inspect rnaseq-nf -c ./wave-singularity.config
{
    "processes": [
        {
            "name": "RNASEQ:INDEX",
            "container": "docker://docker.io/<user>/wavebuild:salmon-1.10.2--fdce05f6d77af751"
        },
        {
            "name": "RNASEQ:QUANT",
            "container": "docker://docker.io/<user>/wavebuild:salmon-1.10.2--fdce05f6d77af751"
        },
        {
            "name": "MULTIQC",
            "container": "docker://docker.io/<user>/wavebuild:multiqc-1.17--d85209f21556c472"
        },
        {
            "name": "RNASEQ:FASTQC",
            "container": "docker://docker.io/<user>/wavebuild:fastqc-0.12.1--f44601bdd08701ed"
        }
    ]
}

Singularity containers built by Wave can be stored locally on your HPC cluster or be served from your preferred registry at runtime providing tremendous flexibility.

Conclusion

Nextflow continues to improve pipeline portability and reproducibility across clusters and cloud computing environments by providing the widest support for container runtimes and cutting-edge functionality for Singularity users.

Today, Nextflow supports Apptainer, Singularity, Charliecloud, Docker, Podman, Sarus, and Shifter with rich support for native Singularity and OCI container formats. Nextflow can run both container formats served from multiple sources, including Singularity Hub, Singularity Library, or any Docker/OCI-compliant registry.

1 Singularity was renamed to Apptainer in November 2021 when the project was officially moved to the Linux Foundation. See the community announcement.

2 Singularity Hub was retired in April of 2021 but the images are still accessible.

3 This functionality requires Apptainer or Singularity version 3.11 or later.

4 By default, Singularity caches files in ~/.singularity/cache/. The cache directory can be pointed to a shared file system visible to all cluster nodes by setting the environment variable SINGULARITY_CACHEDIR.

5 This requires the installation of Singularity and the OCI runc runtime to make it work.