Fusion file system

A distributed, lightweight file system for cloud-native data pipelines

Data management simplified

Supercharge your cloud file system performance

Cloud object stores such as AWS S3 are scalable and cost-effective, but they don’t present a POSIX interface. This means containerized applications must copy data to and from S3 for every task — a slow and inefficient process.

Fusion file system is a virtual, lightweight, distributed file system that bridges the gap between pipelines and cloud-native storage. Fusion enables seamless filesystem I/O to cloud object stores resulting in simpler pipeline logic and over twice the throughput of cloud object storage.

Data management simplified
  • Simplify container and pipeline development and maintenance
  • Avoid the need to pre-install cloud tools in containers or cloud instances
  • Eliminate the need for expensive and complex shared file systems
  • Reduce redundant file I/O and maximize resource use efficiency
  • Accelerate task and overall pipeline execution
  • Boost productivity, reduce time to results


Transparent, automated installation

Traditionally, pipeline developers needed to bundle utilities in containers to copy data in and out of S3 storage.

With Fusion file system, there is nothing to install or manage. The Fusion thin client is automatically installed and configured for your pipeline enabling containerized applications to read and write to S3 buckets, Google Cloud Storage, and other object stores as if they were local storage.

Dramatically reduce data movement

When pipelines run with object storage, tasks typically read data from a bucket, copy it to local block storage for processing, and copy results back to the object store.

The result is significant overhead for every task. Fusion file system enables direct file access to object storage, eliminating unnecessary I/O and dramatically reducing data movement and overall runtime.

No shared file system required

To share data among pipeline tasks, organizations often turn to shared file systems such as Amazon EFS, Amazon FSx for Lustre, or NFS.

Fusion file system avoids the need to deploy, manage, and mount shared file systems on every cloud instance by providing the same functionality over cloud object stores such as S3 — significantly reducing cost and complexity.

Seamless access to cloud object storage

While some open-source projects provide a POSIX interface over object storage, they require developers to install and configure additional software and package it in containers or VMs.

Unlike third-party solutions, Fusion is optimized for Nextflow and handles these tasks automatically, delivering fast, seamless access to cloud object storage.

Maximize pipeline performance and efficiency

Copying data to and from object storage adds latency for every task, lengthening the time containers and cloud instances are deployed. This translates into longer runtimes and significantly higher costs for pipelines with thousands of tasks.

Fusion file system eliminates these bottlenecks and delays, reducing execution time and cloud spending and using compute instances more efficiently.

Breakthrough performance

Boost performance and efficiency with Fusion file system

Fusion file system is a simple, scalable file system solution that works optimally in cloud-native compute environments.

Fusion 2.0 outperforms existing object stores and file systems by avoiding the need for intermediate data copies and leveraging the high-performance Fusion driver. Recent benchmarks conducted by Seqera Labs have shown that Fusion can:

  • Improve pipeline throughput by up to 2.2x compared to AWS S3 1;
  • Deliver performance similar to Amazon FSx for Lustre, without the cost and complexity.
Graph relative performance

Improve operational efficiency

Reduce storage costs in the cloud

When running pipelines in the cloud, users are concerned about two things. The one-time costs associated with each pipeline run and long-term data storage costs.

Fusion FS addresses both cost components. The cost per pipeline run is reduced by avoiding the need for per-instance block storage and by reducing the time that instances are provisioned. Long-term costs are addressed by enabling data to reside in cost-efficient object storage.

In an analysis conducted by Seqera Labs, the Fusion file system can reduce combined pipeline and storage costs by up to 76%.2

Graph relative monthly cost

The fusion architecture

How Fusion FS works

Fusion file system works by eliminating the need to stage data to local block storage for each pipeline task. By eliminating the need for block storage and by accessing object storage via a POSIX interface, efficiency is increased dramatically, resulting in faster pipeline execution and lower costs.

Before Fusion
With Fusion

Get in touch

Ready to turbocharge your pipeline file I/O with Fusion?

Fusion file system works with Nextflow and Wave and presently supports AWS Batch, Google Cloud Batch, and Kubernetes, with support for Azure BLOBs coming soon. Fusion also supports most HPC workload managers.

Download the latest benchmarks in the whitepaper Breakthrough performance and cost-efficiency with the new Fusion file system.

1 Relative performance/efficiency gain running nf-core/rnaseq pipeline based on total CPU hours consumed by storage type. FSx for Lustre and Fusion 2.0 tests were both run with “scratch=false”. Fusion results were obtained using NVMe instances. Comparison is between Amazon S3 and Fusion 2.0.

2 Total pipeline costs by storage method based on a single pipeline run (nf-core/rnaseq) with 30 days of data retention. Comparison is between Amazon FSx for Lustre in Fusion 2.0. FSx for Lustre and Fusion 2.0 tests were run with “scratch=false”. Fusion results were based on NVMe instances. All costs in USD.